Re: AMD64 Machine hardlocks when using memset

2005-04-06 Thread Philip Lawatsch
Philip Lawatsch wrote:

>>Anyone have any suggestions on how to track this further? It seems
>>fairly clear what circumstances are causing it, but as for figuring out
>>what's at fault..
> 

> It seems that mov'ing does not kill my machine while simply using movnti
> does.

Forget about what I just wrote, I've been able to reproduce this in
32bit mode too although it did take a long while to happen.

And glibc in 32bit mode simply uses mov in a normal loop to write to the
memory.

Looks like using mov in 64bit mode polluted my cache and crippled
performance (have been running some other programs in the background)
and thus perhaps didnt trigger the problem.

I'm going nuts with this.

kind regards Philip
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-06 Thread Arjan van de Ven
On Wed, 2005-04-06 at 12:59 +0200, Philip Lawatsch wrote:
> Robert Hancock wrote:
> > Alan Cox wrote:
> > 
> >> On Sad, 2005-04-02 at 05:50, Robert Hancock wrote:
> >>
> >>> I'm wondering if one does a ton of these cache-bypassing stores
> >>> whether something gets hosed because of that. Not sure what that
> >>> could be though. I don't imagine the chipset is involved with any of
> >>> that on the Athlon 64 - either the CPU or RAM seems the most likely
> >>> suspect to me
> >>
> >>
> >>
> >> The glibc version is essentially the "perfect" copy function for the
> >> CPU. If you have any bus/memory problems or chipset bugs it will bite
> >> you.
> > 
> > 
> > Anyone have any suggestions on how to track this further? It seems
> > fairly clear what circumstances are causing it, but as for figuring out
> > what's at fault..
> 
> Digging through my glibc's source if found that if you memset arrays
> <12 bytes it will use good old mov instructions to do the job. In
> case of arrays larger than 12 bytes it will use movnti instructions
> to do the job.
> 
> Thus I refined my test code to use mov for memset regardless of the size
> (simply abused glibcs code a little bit)
> 
> -> No crash!
> 
> Then, changing the all the mov to movnti and my machine frags again :(
> 
> It seems that mov'ing does not kill my machine while simply using movnti
> does.

movnti also gets a higher bandwidth so that doesn't rule out too much..



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-06 Thread Philip Lawatsch
Robert Hancock wrote:
> Alan Cox wrote:
> 
>> On Sad, 2005-04-02 at 05:50, Robert Hancock wrote:
>>
>>> I'm wondering if one does a ton of these cache-bypassing stores
>>> whether something gets hosed because of that. Not sure what that
>>> could be though. I don't imagine the chipset is involved with any of
>>> that on the Athlon 64 - either the CPU or RAM seems the most likely
>>> suspect to me
>>
>>
>>
>> The glibc version is essentially the "perfect" copy function for the
>> CPU. If you have any bus/memory problems or chipset bugs it will bite
>> you.
> 
> 
> Anyone have any suggestions on how to track this further? It seems
> fairly clear what circumstances are causing it, but as for figuring out
> what's at fault..

Digging through my glibc's source if found that if you memset arrays
<12 bytes it will use good old mov instructions to do the job. In
case of arrays larger than 12 bytes it will use movnti instructions
to do the job.

Thus I refined my test code to use mov for memset regardless of the size
(simply abused glibcs code a little bit)

-> No crash!

Then, changing the all the mov to movnti and my machine frags again :(

It seems that mov'ing does not kill my machine while simply using movnti
does.

kind regards Philip

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-06 Thread Philip Lawatsch
Rafael J. Wysocki wrote:

>>Anyone have any suggestions on how to track this further? It seems 
>>fairly clear what circumstances are causing it, but as for figuring out 
>>what's at fault..
> 
> 
> Well, I would start from changing memory modules.

As I wrote earlier, I tried 4 different (but same brand) modules, 2
Infineon and 2 Samsung ones. No difference.

Btw, I've been working (stressing) the machine for one week now and
never had any problems, the system seems rock solid (until I start my
memory stresser).

kind regards Philip
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-06 Thread Denis Vlasenko
[disregard my previous mail. I should have read the whole thread first]

On Saturday 02 April 2005 07:50, Robert Hancock wrote:
> As it turns out, the memset in my version of glibc x86_64 is not using 
> such a string instruction though - it seems to be using two different 
> sets of instructions depending on the size of the memset (not sure 
> exactly how they're calculating the threshold between these..) For sizes 
> below the treshold, this is the inner loop - it's using normal mov 
> instructions:
> 
> 3:/* Copy 64 bytes.  */
>   mov %r8,(%rcx)
>   mov %r8,0x8(%rcx)
>   mov %r8,0x10(%rcx)
>   mov %r8,0x18(%rcx)
>   mov %r8,0x20(%rcx)
>   mov %r8,0x28(%rcx)
>   mov %r8,0x30(%rcx)
>   mov %r8,0x38(%rcx)
>   add $0x40,%rcx
>   dec %rax
>   jne 3b
> 
> For sizes above the threshold though, this is the inner loop. It's using 
> movnti which is an SSE cache-bypasssing store:
> 
> 11:   /* Copy 64 bytes without polluting the cache.  */
>   /* We could use movntdq%xmm0,(%rcx) here to further
>  speed up for large cases but let's not use XMM registers.  */
>   movnti  %r8,(%rcx)
>   movnti  %r8,0x8(%rcx)
>   movnti  %r8,0x10(%rcx)
>   movnti  %r8,0x18(%rcx)
>   movnti  %r8,0x20(%rcx)
>   movnti  %r8,0x28(%rcx)
>   movnti  %r8,0x30(%rcx)
>   movnti  %r8,0x38(%rcx)
>   add $0x40,%rcx
>   dec %rax
>   jne 11b

This is a very rarely used instruction. People either do
plain old rep stosl or do 3DNOW or SSE2 non-temporal stores.

Maybe movnti is different (buggy?) in subtle way.

Does it blow up if you use 3DNOW or SSE2 non-temporal stores?

If yes, then try different BIOS (not nesessarily latest is best).
BTW, 'Athlon bug' was tracked down similarly. New BIOS enabled
buggy chipset feature - BOOM! non-temporals killed the box
(took several months to figure it out back then).
--
vda

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-06 Thread Rafael J. Wysocki
Hi,

On Wednesday, 6 of April 2005 06:05, Robert Hancock wrote:
> Alan Cox wrote:
> > On Sad, 2005-04-02 at 05:50, Robert Hancock wrote:
> > 
> >>I'm wondering if one does a ton of these cache-bypassing stores whether 
> >>something gets hosed because of that. Not sure what that could be 
> >>though. I don't imagine the chipset is involved with any of that on the 
> >>Athlon 64 - either the CPU or RAM seems the most likely suspect to me
> > 
> > 
> > The glibc version is essentially the "perfect" copy function for the
> > CPU. If you have any bus/memory problems or chipset bugs it will bite
> > you.
> 
> Anyone have any suggestions on how to track this further? It seems 
> fairly clear what circumstances are causing it, but as for figuring out 
> what's at fault..

Well, I would start from changing memory modules.

Greets,
Rafael


-- 
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-06 Thread Rafael J. Wysocki
Hi,

On Wednesday, 6 of April 2005 06:05, Robert Hancock wrote:
 Alan Cox wrote:
  On Sad, 2005-04-02 at 05:50, Robert Hancock wrote:
  
 I'm wondering if one does a ton of these cache-bypassing stores whether 
 something gets hosed because of that. Not sure what that could be 
 though. I don't imagine the chipset is involved with any of that on the 
 Athlon 64 - either the CPU or RAM seems the most likely suspect to me
  
  
  The glibc version is essentially the perfect copy function for the
  CPU. If you have any bus/memory problems or chipset bugs it will bite
  you.
 
 Anyone have any suggestions on how to track this further? It seems 
 fairly clear what circumstances are causing it, but as for figuring out 
 what's at fault..

Well, I would start from changing memory modules.

Greets,
Rafael


-- 
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll Alice's Adventures in Wonderland
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-06 Thread Denis Vlasenko
[disregard my previous mail. I should have read the whole thread first]

On Saturday 02 April 2005 07:50, Robert Hancock wrote:
 As it turns out, the memset in my version of glibc x86_64 is not using 
 such a string instruction though - it seems to be using two different 
 sets of instructions depending on the size of the memset (not sure 
 exactly how they're calculating the threshold between these..) For sizes 
 below the treshold, this is the inner loop - it's using normal mov 
 instructions:
 
 3:/* Copy 64 bytes.  */
   mov %r8,(%rcx)
   mov %r8,0x8(%rcx)
   mov %r8,0x10(%rcx)
   mov %r8,0x18(%rcx)
   mov %r8,0x20(%rcx)
   mov %r8,0x28(%rcx)
   mov %r8,0x30(%rcx)
   mov %r8,0x38(%rcx)
   add $0x40,%rcx
   dec %rax
   jne 3b
 
 For sizes above the threshold though, this is the inner loop. It's using 
 movnti which is an SSE cache-bypasssing store:
 
 11:   /* Copy 64 bytes without polluting the cache.  */
   /* We could use movntdq%xmm0,(%rcx) here to further
  speed up for large cases but let's not use XMM registers.  */
   movnti  %r8,(%rcx)
   movnti  %r8,0x8(%rcx)
   movnti  %r8,0x10(%rcx)
   movnti  %r8,0x18(%rcx)
   movnti  %r8,0x20(%rcx)
   movnti  %r8,0x28(%rcx)
   movnti  %r8,0x30(%rcx)
   movnti  %r8,0x38(%rcx)
   add $0x40,%rcx
   dec %rax
   jne 11b

This is a very rarely used instruction. People either do
plain old rep stosl or do 3DNOW or SSE2 non-temporal stores.

Maybe movnti is different (buggy?) in subtle way.

Does it blow up if you use 3DNOW or SSE2 non-temporal stores?

If yes, then try different BIOS (not nesessarily latest is best).
BTW, 'Athlon bug' was tracked down similarly. New BIOS enabled
buggy chipset feature - BOOM! non-temporals killed the box
(took several months to figure it out back then).
--
vda

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-06 Thread Philip Lawatsch
Rafael J. Wysocki wrote:

Anyone have any suggestions on how to track this further? It seems 
fairly clear what circumstances are causing it, but as for figuring out 
what's at fault..
 
 
 Well, I would start from changing memory modules.

As I wrote earlier, I tried 4 different (but same brand) modules, 2
Infineon and 2 Samsung ones. No difference.

Btw, I've been working (stressing) the machine for one week now and
never had any problems, the system seems rock solid (until I start my
memory stresser).

kind regards Philip
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-06 Thread Philip Lawatsch
Robert Hancock wrote:
 Alan Cox wrote:
 
 On Sad, 2005-04-02 at 05:50, Robert Hancock wrote:

 I'm wondering if one does a ton of these cache-bypassing stores
 whether something gets hosed because of that. Not sure what that
 could be though. I don't imagine the chipset is involved with any of
 that on the Athlon 64 - either the CPU or RAM seems the most likely
 suspect to me



 The glibc version is essentially the perfect copy function for the
 CPU. If you have any bus/memory problems or chipset bugs it will bite
 you.
 
 
 Anyone have any suggestions on how to track this further? It seems
 fairly clear what circumstances are causing it, but as for figuring out
 what's at fault..

Digging through my glibc's source if found that if you memset arrays
12 bytes it will use good old mov instructions to do the job. In
case of arrays larger than 12 bytes it will use movnti instructions
to do the job.

Thus I refined my test code to use mov for memset regardless of the size
(simply abused glibcs code a little bit)

- No crash!

Then, changing the all the mov to movnti and my machine frags again :(

It seems that mov'ing does not kill my machine while simply using movnti
does.

kind regards Philip

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-06 Thread Arjan van de Ven
On Wed, 2005-04-06 at 12:59 +0200, Philip Lawatsch wrote:
 Robert Hancock wrote:
  Alan Cox wrote:
  
  On Sad, 2005-04-02 at 05:50, Robert Hancock wrote:
 
  I'm wondering if one does a ton of these cache-bypassing stores
  whether something gets hosed because of that. Not sure what that
  could be though. I don't imagine the chipset is involved with any of
  that on the Athlon 64 - either the CPU or RAM seems the most likely
  suspect to me
 
 
 
  The glibc version is essentially the perfect copy function for the
  CPU. If you have any bus/memory problems or chipset bugs it will bite
  you.
  
  
  Anyone have any suggestions on how to track this further? It seems
  fairly clear what circumstances are causing it, but as for figuring out
  what's at fault..
 
 Digging through my glibc's source if found that if you memset arrays
 12 bytes it will use good old mov instructions to do the job. In
 case of arrays larger than 12 bytes it will use movnti instructions
 to do the job.
 
 Thus I refined my test code to use mov for memset regardless of the size
 (simply abused glibcs code a little bit)
 
 - No crash!
 
 Then, changing the all the mov to movnti and my machine frags again :(
 
 It seems that mov'ing does not kill my machine while simply using movnti
 does.

movnti also gets a higher bandwidth so that doesn't rule out too much..



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-06 Thread Philip Lawatsch
Philip Lawatsch wrote:

Anyone have any suggestions on how to track this further? It seems
fairly clear what circumstances are causing it, but as for figuring out
what's at fault..
 

 It seems that mov'ing does not kill my machine while simply using movnti
 does.

Forget about what I just wrote, I've been able to reproduce this in
32bit mode too although it did take a long while to happen.

And glibc in 32bit mode simply uses mov in a normal loop to write to the
memory.

Looks like using mov in 64bit mode polluted my cache and crippled
performance (have been running some other programs in the background)
and thus perhaps didnt trigger the problem.

I'm going nuts with this.

kind regards Philip
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-05 Thread Robert Hancock
Alan Cox wrote:
On Sad, 2005-04-02 at 05:50, Robert Hancock wrote:
I'm wondering if one does a ton of these cache-bypassing stores whether 
something gets hosed because of that. Not sure what that could be 
though. I don't imagine the chipset is involved with any of that on the 
Athlon 64 - either the CPU or RAM seems the most likely suspect to me

The glibc version is essentially the "perfect" copy function for the
CPU. If you have any bus/memory problems or chipset bugs it will bite
you.
Anyone have any suggestions on how to track this further? It seems 
fairly clear what circumstances are causing it, but as for figuring out 
what's at fault..

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-05 Thread Robert Hancock
Alan Cox wrote:
On Sad, 2005-04-02 at 05:50, Robert Hancock wrote:
I'm wondering if one does a ton of these cache-bypassing stores whether 
something gets hosed because of that. Not sure what that could be 
though. I don't imagine the chipset is involved with any of that on the 
Athlon 64 - either the CPU or RAM seems the most likely suspect to me

The glibc version is essentially the perfect copy function for the
CPU. If you have any bus/memory problems or chipset bugs it will bite
you.
Anyone have any suggestions on how to track this further? It seems 
fairly clear what circumstances are causing it, but as for figuring out 
what's at fault..

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-04 Thread Alan Cox
On Sad, 2005-04-02 at 05:50, Robert Hancock wrote:
> I'm wondering if one does a ton of these cache-bypassing stores whether 
> something gets hosed because of that. Not sure what that could be 
> though. I don't imagine the chipset is involved with any of that on the 
> Athlon 64 - either the CPU or RAM seems the most likely suspect to me

The glibc version is essentially the "perfect" copy function for the
CPU. If you have any bus/memory problems or chipset bugs it will bite
you.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-04 Thread Alan Cox
On Sad, 2005-04-02 at 05:50, Robert Hancock wrote:
 I'm wondering if one does a ton of these cache-bypassing stores whether 
 something gets hosed because of that. Not sure what that could be 
 though. I don't imagine the chipset is involved with any of that on the 
 Athlon 64 - either the CPU or RAM seems the most likely suspect to me

The glibc version is essentially the perfect copy function for the
CPU. If you have any bus/memory problems or chipset bugs it will bite
you.

Alan

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-01 Thread Robert Hancock
Paul Jackson wrote:
The x86_64 memset(), both in user space and the kernel, for whatever gcc
I have, and for a current kernel, uses the "repz stos" or "rep stosq"
prefixed instruction for the bulk of the copy.  This combination is a
long running, interruptible Intel string instruction that loops on
itself until the CX register decrements to zero.
Was your windows app using "stos"?
I'll wager a nickel that the actual crash you see comes when the
processor has to handle an interrupt while in the middle of this
instruction.
I'll wager a dime it's hardware, though interrupt activity may be
required to provoke it.
I ended up making a test program which essentially did the same thing 
except not using memset (just moving an int* up repeatedly and setting 
the value there to 0). That worked fine on both Windows and Linux. I 
then tried such a program using a long* compiled as 64-bit on Linux, 
that also worked fine. It seems like I can only reproduce it when memset 
is actually used..

I don't remember exactly what the Windows memset was using, that was on 
my work machine - it was inline assembly though, and I do know that it 
had only one instruction for the whole set, so it was likely "repz stos" 
or something similar to that.

As it turns out, the memset in my version of glibc x86_64 is not using 
such a string instruction though - it seems to be using two different 
sets of instructions depending on the size of the memset (not sure 
exactly how they're calculating the threshold between these..) For sizes 
below the treshold, this is the inner loop - it's using normal mov 
instructions:

3:  /* Copy 64 bytes.  */
mov %r8,(%rcx)
mov %r8,0x8(%rcx)
mov %r8,0x10(%rcx)
mov %r8,0x18(%rcx)
mov %r8,0x20(%rcx)
mov %r8,0x28(%rcx)
mov %r8,0x30(%rcx)
mov %r8,0x38(%rcx)
add $0x40,%rcx
dec %rax
jne 3b
For sizes above the threshold though, this is the inner loop. It's using 
movnti which is an SSE cache-bypasssing store:

11: /* Copy 64 bytes without polluting the cache.  */
/* We could use movntdq%xmm0,(%rcx) here to further
   speed up for large cases but let's not use XMM registers.  */
movnti  %r8,(%rcx)
movnti  %r8,0x8(%rcx)
movnti  %r8,0x10(%rcx)
movnti  %r8,0x18(%rcx)
movnti  %r8,0x20(%rcx)
movnti  %r8,0x28(%rcx)
movnti  %r8,0x30(%rcx)
movnti  %r8,0x38(%rcx)
add $0x40,%rcx
dec %rax
jne 11b
I'm wondering if one does a ton of these cache-bypassing stores whether 
something gets hosed because of that. Not sure what that could be 
though. I don't imagine the chipset is involved with any of that on the 
Athlon 64 - either the CPU or RAM seems the most likely suspect to me

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-01 Thread Paul Jackson
Robert wrote:
> It does run visibly slower

The x86_64 memset(), both in user space and the kernel, for whatever gcc
I have, and for a current kernel, uses the "repz stos" or "rep stosq"
prefixed instruction for the bulk of the copy.  This combination is a
long running, interruptible Intel string instruction that loops on
itself until the CX register decrements to zero.

Was your windows app using "stos"?

I'll wager a nickel that the actual crash you see comes when the
processor has to handle an interrupt while in the middle of this
instruction.

I'll wager a dime it's hardware, though interrupt activity may be
required to provoke it.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-01 Thread Robert Hancock
Ray Lee wrote:
On Thu, 2005-03-31 at 22:37 -0600, Robert Hancock wrote:
This is getting pretty ridiculous.. I've tried memory timings down to 
the slowest possible, ran Memtest86 for 4 passes with no errors, and 
it's been stable in Windows for a few months now. Still something is 
blowing up in Linux with this test though..

Have you run the same memset test under windows?
I've traced a lot of oddball problems down to bad or marginal power
supplies.
I've now built a similar test program for Windows. I've let it run over 
2000 iterations of 512MB memsets with no problems. On Linux it usually 
blew up with under 200 iterations. It does run visibly slower than the 
Linux version though - this is after all 32 bit Windows and it was 
compiled with crufty old Visual C++ 6.0 so it is probably not that 
optimized for this CPU. I will see if I can get a more optimized build 
of this to try in Mingw32 or something.. after all if it's related to 
some instruction combination or something it may not show up in the 
build I have.

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-01 Thread Philip Lawatsch
Ray Lee wrote:
> On Thu, 2005-03-31 at 22:37 -0600, Robert Hancock wrote:
> 
>>This is getting pretty ridiculous.. I've tried memory timings down to 
>>the slowest possible, ran Memtest86 for 4 passes with no errors, and 
>>it's been stable in Windows for a few months now. Still something is 
>>blowing up in Linux with this test though..
> 
> 
> Have you run the same memset test under windows?
> 
> I've traced a lot of oddball problems down to bad or marginal power
> supplies.

So far I've tried 2 PSUs and 3 different brands of memory.

No differences. And due to a lack of windows I cant really test it.

I'll try a different (not based on nforce 4) motherboard now.


kind regards Philip
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-01 Thread Ray Lee
On Thu, 2005-03-31 at 22:37 -0600, Robert Hancock wrote:
> This is getting pretty ridiculous.. I've tried memory timings down to 
> the slowest possible, ran Memtest86 for 4 passes with no errors, and 
> it's been stable in Windows for a few months now. Still something is 
> blowing up in Linux with this test though..

Have you run the same memset test under windows?

I've traced a lot of oddball problems down to bad or marginal power
supplies.

Ray

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-01 Thread Denis Vlasenko
On Friday 01 April 2005 07:37, Robert Hancock wrote:
> Stelian Pop wrote:
> > Just a thought: does deactivating cpufreq change anything ?
> > 
> > I haven't tested yet your program, but on my Asus K8NE-Deluxe very
> > strange things happen if cpufreq/powernow is activated *and* 
> > the cpu frequency is changed...
> 
> Didn't change anything for me, I tried deactivating cpufreq, still 
> crashes when I run that test program.
> 
> This is getting pretty ridiculous.. I've tried memory timings down to 
> the slowest possible, ran Memtest86 for 4 passes with no errors, and 
> it's been stable in Windows for a few months now. Still something is 
> blowing up in Linux with this test though..

If you want to dig deeper, go to assembler level.
That is, instead of using memset(), disassemble
your program and make your own

void my_memset(...)
{
asm volatile(/* code sequence from your crashing prog*/);
}

and use that in your memsetting loop. Sure, it won't change anything,
but:

a) we will know exactly which instruction sequence drives
   your CPU/chipset crazy
b) others can try to reproduce without danger of memset being
   implemented differently on their perticular version of gcc/glibc/whatever
c) you can try other memsets in order to know more about this bug
   (for example, if inserting some NOPs in the my_memset body
   makes bug disappear will definitely point towards defective/
   overheating CPU. etc...)
--
vda

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-01 Thread Denis Vlasenko
On Friday 01 April 2005 07:37, Robert Hancock wrote:
 Stelian Pop wrote:
  Just a thought: does deactivating cpufreq change anything ?
  
  I haven't tested yet your program, but on my Asus K8NE-Deluxe very
  strange things happen if cpufreq/powernow is activated *and* 
  the cpu frequency is changed...
 
 Didn't change anything for me, I tried deactivating cpufreq, still 
 crashes when I run that test program.
 
 This is getting pretty ridiculous.. I've tried memory timings down to 
 the slowest possible, ran Memtest86 for 4 passes with no errors, and 
 it's been stable in Windows for a few months now. Still something is 
 blowing up in Linux with this test though..

If you want to dig deeper, go to assembler level.
That is, instead of using memset(), disassemble
your program and make your own

void my_memset(...)
{
asm volatile(/* code sequence from your crashing prog*/);
}

and use that in your memsetting loop. Sure, it won't change anything,
but:

a) we will know exactly which instruction sequence drives
   your CPU/chipset crazy
b) others can try to reproduce without danger of memset being
   implemented differently on their perticular version of gcc/glibc/whatever
c) you can try other memsets in order to know more about this bug
   (for example, if inserting some NOPs in the my_memset body
   makes bug disappear will definitely point towards defective/
   overheating CPU. etc...)
--
vda

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-01 Thread Ray Lee
On Thu, 2005-03-31 at 22:37 -0600, Robert Hancock wrote:
 This is getting pretty ridiculous.. I've tried memory timings down to 
 the slowest possible, ran Memtest86 for 4 passes with no errors, and 
 it's been stable in Windows for a few months now. Still something is 
 blowing up in Linux with this test though..

Have you run the same memset test under windows?

I've traced a lot of oddball problems down to bad or marginal power
supplies.

Ray

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-01 Thread Philip Lawatsch
Ray Lee wrote:
 On Thu, 2005-03-31 at 22:37 -0600, Robert Hancock wrote:
 
This is getting pretty ridiculous.. I've tried memory timings down to 
the slowest possible, ran Memtest86 for 4 passes with no errors, and 
it's been stable in Windows for a few months now. Still something is 
blowing up in Linux with this test though..
 
 
 Have you run the same memset test under windows?
 
 I've traced a lot of oddball problems down to bad or marginal power
 supplies.

So far I've tried 2 PSUs and 3 different brands of memory.

No differences. And due to a lack of windows I cant really test it.

I'll try a different (not based on nforce 4) motherboard now.


kind regards Philip
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-01 Thread Robert Hancock
Ray Lee wrote:
On Thu, 2005-03-31 at 22:37 -0600, Robert Hancock wrote:
This is getting pretty ridiculous.. I've tried memory timings down to 
the slowest possible, ran Memtest86 for 4 passes with no errors, and 
it's been stable in Windows for a few months now. Still something is 
blowing up in Linux with this test though..

Have you run the same memset test under windows?
I've traced a lot of oddball problems down to bad or marginal power
supplies.
I've now built a similar test program for Windows. I've let it run over 
2000 iterations of 512MB memsets with no problems. On Linux it usually 
blew up with under 200 iterations. It does run visibly slower than the 
Linux version though - this is after all 32 bit Windows and it was 
compiled with crufty old Visual C++ 6.0 so it is probably not that 
optimized for this CPU. I will see if I can get a more optimized build 
of this to try in Mingw32 or something.. after all if it's related to 
some instruction combination or something it may not show up in the 
build I have.

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-01 Thread Paul Jackson
Robert wrote:
 It does run visibly slower

The x86_64 memset(), both in user space and the kernel, for whatever gcc
I have, and for a current kernel, uses the repz stos or rep stosq
prefixed instruction for the bulk of the copy.  This combination is a
long running, interruptible Intel string instruction that loops on
itself until the CX register decrements to zero.

Was your windows app using stos?

I'll wager a nickel that the actual crash you see comes when the
processor has to handle an interrupt while in the middle of this
instruction.

I'll wager a dime it's hardware, though interrupt activity may be
required to provoke it.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-04-01 Thread Robert Hancock
Paul Jackson wrote:
The x86_64 memset(), both in user space and the kernel, for whatever gcc
I have, and for a current kernel, uses the repz stos or rep stosq
prefixed instruction for the bulk of the copy.  This combination is a
long running, interruptible Intel string instruction that loops on
itself until the CX register decrements to zero.
Was your windows app using stos?
I'll wager a nickel that the actual crash you see comes when the
processor has to handle an interrupt while in the middle of this
instruction.
I'll wager a dime it's hardware, though interrupt activity may be
required to provoke it.
I ended up making a test program which essentially did the same thing 
except not using memset (just moving an int* up repeatedly and setting 
the value there to 0). That worked fine on both Windows and Linux. I 
then tried such a program using a long* compiled as 64-bit on Linux, 
that also worked fine. It seems like I can only reproduce it when memset 
is actually used..

I don't remember exactly what the Windows memset was using, that was on 
my work machine - it was inline assembly though, and I do know that it 
had only one instruction for the whole set, so it was likely repz stos 
or something similar to that.

As it turns out, the memset in my version of glibc x86_64 is not using 
such a string instruction though - it seems to be using two different 
sets of instructions depending on the size of the memset (not sure 
exactly how they're calculating the threshold between these..) For sizes 
below the treshold, this is the inner loop - it's using normal mov 
instructions:

3:  /* Copy 64 bytes.  */
mov %r8,(%rcx)
mov %r8,0x8(%rcx)
mov %r8,0x10(%rcx)
mov %r8,0x18(%rcx)
mov %r8,0x20(%rcx)
mov %r8,0x28(%rcx)
mov %r8,0x30(%rcx)
mov %r8,0x38(%rcx)
add $0x40,%rcx
dec %rax
jne 3b
For sizes above the threshold though, this is the inner loop. It's using 
movnti which is an SSE cache-bypasssing store:

11: /* Copy 64 bytes without polluting the cache.  */
/* We could use movntdq%xmm0,(%rcx) here to further
   speed up for large cases but let's not use XMM registers.  */
movnti  %r8,(%rcx)
movnti  %r8,0x8(%rcx)
movnti  %r8,0x10(%rcx)
movnti  %r8,0x18(%rcx)
movnti  %r8,0x20(%rcx)
movnti  %r8,0x28(%rcx)
movnti  %r8,0x30(%rcx)
movnti  %r8,0x38(%rcx)
add $0x40,%rcx
dec %rax
jne 11b
I'm wondering if one does a ton of these cache-bypassing stores whether 
something gets hosed because of that. Not sure what that could be 
though. I don't imagine the chipset is involved with any of that on the 
Athlon 64 - either the CPU or RAM seems the most likely suspect to me

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-03-31 Thread Robert Hancock
Stelian Pop wrote:
Just a thought: does deactivating cpufreq change anything ?
I haven't tested yet your program, but on my Asus K8NE-Deluxe very
strange things happen if cpufreq/powernow is activated *and* 
the cpu frequency is changed...
Didn't change anything for me, I tried deactivating cpufreq, still 
crashes when I run that test program.

This is getting pretty ridiculous.. I've tried memory timings down to 
the slowest possible, ran Memtest86 for 4 passes with no errors, and 
it's been stable in Windows for a few months now. Still something is 
blowing up in Linux with this test though..

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-03-31 Thread Robert Hancock
Philip Lawatsch wrote:
I've now tried the most conservative settings available. The 32 bit
kernel now hangs after about 15 Iterations (compared to about 16000
before) but the 64 bit kernel still hangs after about 5000.
I'm still seeing this on my system as well, using the most conservative 
timings possible (DDR200, all delay parameters except the refresh time 
set to the largest possible value) as well as DDR333 with the same 
timings and DDR400 with everything set to auto. I also tried the kernel 
on the Fedora Core 3 rescue disc (same crash) and in single user mode 
(same crash).

So far, the crashes have consisted of either a hang, reboot or panic. 
One panic was a "spinlock already locked at kernel/module.c:2022" error. 
The other one is below, for what it's worth:

Mar 31 18:55:43 Newcastle kernel: Unable to handle kernel paging request 
at 8100588f5000 RIP:
Mar 31 18:55:43 Newcastle kernel: {clear_page+7}
Mar 31 18:55:43 Newcastle kernel: PGD 8063 PUD a063 PMD 0
Mar 31 18:55:43 Newcastle kernel: Oops: 0002 [1]
Mar 31 18:55:43 Newcastle kernel: CPU 0
Mar 31 18:55:43 Newcastle kernel: Modules linked in: md5(U) ipv6(U) 
parport_pc(U) lp(U) parport(U) autofs4(U) it87(U) i2c_sensor(U) 
i2c_isa(U) i2c_dev(U) i2c_core(U) sunrpc(U) pcmcia(U) yenta_socket(U) 
rsrc_nonstatic(U) pcmcia_core(U) joydev(U) nls_utf8(U) ntfs(U) vfat(U) 
fat(U) dm_mod(U) video(U) button(U) battery(U) ac(U) usb_storage(U) 
ohci1394(U) ieee1394(U) ohci_hcd(U) ehci_hcd(U) snd_ice1724(U) 
snd_ice17xx_ak4xxx(U) snd_ac97_codec(U) snd_pcm_oss(U) snd_mixer_oss(U) 
snd_pcm(U) snd_timer(U) snd_page_alloc(U) snd_ak4xxx_adda(U) 
snd_mpu401_uart(U) snd_rawmidi(U) snd_seq_device(U) snd(U) soundcore(U) 
forcedeth(U) floppy(U) ext3(U) jbd(U) sata_nv(U) libata(U) sd_mod(U) 
scsi_mod(U)
Mar 31 18:55:43 Newcastle kernel: Pid: 4928, comm: crashtest Not tainted 
2.6.11-1.7_FC3custom
Mar 31 18:55:43 Newcastle kernel: RIP: 0010:[] 
{clear_page+7}
Mar 31 18:55:43 Newcastle kernel: RSP: :810078299ca0  EFLAGS: 
00010246
Mar 31 18:55:43 Newcastle kernel: RAX:  RBX: 
0001 RCX: 0200
Mar 31 18:55:43 Newcastle kernel: RDX: 80478940 RSI: 
 RDI: 8100588f5000
Mar 31 18:55:43 Newcastle kernel: RBP: 81000235f5d0 R08: 
 R09: 
Mar 31 18:55:43 Newcastle kernel: R10: 000552fa R11: 
 R12: 8100
Mar 31 18:55:43 Newcastle kernel: R13: 81000235f598 R14: 
6db6db6db6db6db7 R15: 
Mar 31 18:55:43 Newcastle kernel: FS:  2aabeb00() 
GS:80552300() knlGS:
Mar 31 18:55:43 Newcastle kernel: CS:  0010 DS:  ES:  CR0: 
8005003b
Mar 31 18:55:43 Newcastle kernel: CR2: 8100588f5000 CR3: 
791b CR4: 06e0
Mar 31 18:55:43 Newcastle kernel: Process crashtest (pid: 4928, 
threadinfo 810078298000, task 8100788d67e0)
Mar 31 18:55:43 Newcastle kernel: Stack: 80170bc2 
0019 0286 
Mar 31 18:55:43 Newcastle kernel:000a 
80d2000a 0286 0256
Mar 31 18:55:43 Newcastle kernel:80478bc0 
Mar 31 18:55:43 Newcastle kernel: Call 
Trace:{buffered_rmqueue+1154} 
{__alloc_pages+220}
Mar 31 18:55:43 Newcastle kernel: 
{do_no_page+370} {handle_mm_fault+560}
Mar 31 18:55:43 Newcastle kernel: 
{write_chan+860} {do_page_fault+1044}
Mar 31 18:55:43 Newcastle kernel: 
{thread_return+41} {error_exit+0}
Mar 31 18:55:43 Newcastle kernel:
Mar 31 18:55:43 Newcastle kernel:
Mar 31 18:55:43 Newcastle kernel: Code: f3 48 ab c3 66 66 66 90 66 66 66 
90 66 66 66 90 66 66 66 90
Mar 31 18:55:43 Newcastle kernel: RIP {clear_page+7} 
RSP 
Mar 31 18:55:43 Newcastle kernel: CR2: 8100588f5000

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-03-31 Thread Paul Jackson
> your memory timings are out of spec.

I don't know what spec applies here, don't really care.
But when I backed off my Memory Timing from 1T to 2T,
my box became stable running this memset() test.

So I am a happy camper, grateful that someone posted
this nice test, and agree with you that it was a memory
timing issue, at least for my system.

Apparently Philip's box has additional "issues".  Whatever.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-03-31 Thread Stelian Pop
On Thu, Mar 31, 2005 at 12:04:59AM +0200, Philip Lawatsch wrote:

> I do have a very strange problem:
> 
> If I memset a ~1meg buffer some thousand times (in the userspace) it
> will hardlock my machine.
> 
> I've been using 2.6.12-rc1 and also a lot of other kernels (2.6.9,
> 2.6.11). I've tried it both using a 32 bit kernel and a 64 bit kernel.
> When running on the 32 bit kernel the machine hardlocks after about
> 15000 iterations, on a 64 bit kernel the machine hardlocks after about
> 5000 (the 64 bit system has nearly no background jobs running).
> 
> I've been running memcheck for several hours now but nothing did show up.
> 
> 
> I've got an Asus A8N-SLI board with 2 gigs of memory and an AMD 3500+ CPU.
> 
> The 64 bit kernel was compiled using gcc 3.4.3 and the 32 bit kernel
> using 3.3.5.
[...]

> powernow-k8: Found 1 AMD Athlon 64 / Opteron processors (version 1.00.09e)
> powernow-k8:0 : fid 0xe (2200 MHz), vid 0x6 (1400 mV)
> powernow-k8:1 : fid 0xc (2000 MHz), vid 0x8 (1350 mV)
> powernow-k8:2 : fid 0xa (1800 MHz), vid 0xa (1300 mV)
> powernow-k8:3 : fid 0x2 (1000 MHz), vid 0x12 (1100 mV)
> cpu_init done, current fid 0xe, vid 0x6

Just a thought: does deactivating cpufreq change anything ?

I haven't tested yet your program, but on my Asus K8NE-Deluxe very
strange things happen if cpufreq/powernow is activated *and* 
the cpu frequency is changed...

Stelian.
-- 
Stelian Pop <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-03-31 Thread Mikael Pettersson
Paul Jackson writes:
 > Yup - kills my x86_64 too.  I can't stay up for half a minute.
...
 > My mainboard is an MSI K8N Neo2 Platinum.

I've tested both versions of the test program on two Athlon64 boxes,
and neither has had any problems with them.

My two machines are both VIA K8T800-based (a desktop and a laptop),
but it seems those of you who had problems have nForce-based machines.
So presumably it's either the nForce chipset or your memory timings are
out of spec.

/Mikael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-03-31 Thread Paul Jackson
Your problem is almost certainly in the hardware area (cpu, bios,
memory, power, northbridge, motherboard, cooling or thereabouts).

> Imo memtest86 should not hang onless something screws up [its] memory area

There is nothing else running when memtest runs.  You cannot assume
that your hardware is operating like a sane digital computer when
memtest hangs - the magic of zero's, one's and instruction set
architectures is coming unglued and you are getting a glimpse of the
ugliness that is usually hidden behind the curtain.

Good luck fixing it.

LKML is probably not the place to continue to analyze this, now that
you've recreated it with memtest as well.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-03-31 Thread Philip Lawatsch
Paul Jackson wrote:
> Denis wrote:
> 
>>This reminds me on VIA northbridge problem when BIOS enabled
>>a feature which was experimental and turned out to be buggy.
> 
> 
> You were close!
> 
> I changed my Memory Timing from 1T to 2T, and now it is as solid as a
> rock.  It has been up 7 minutes as I type this, without a hiccup.
> 
> Notice this comment, at http://www.vr-zone.com.sg/?i=1641=1=0
> 
> Well as most Athlon 64 users know, 1T setting improves performance quite
> significantly over 2T, but it is also very taxing on the memory and
> quite a hit-and-miss when matching different memory with different
> boards. From some users' feedback, the Asus A8N SLI can be a little
> picky with 1T setting when overclocking, so results might be a little
> better with other boards.
> 

I've now tried the most conservative settings available. The 32 bit
kernel now hangs after about 15 Iterations (compared to about 16000
before) but the 64 bit kernel still hangs after about 5000.

After a ~12 hour memtest86 run memtest86 crashed (!), filling the
console with some garbage characters and then hanging.

This is driving me crazy.

Imo memtest86 should not hang onless something screws up the memory area
it is loaded into.

I've also tried the newest beta bios for the board now, didnt change
anything.

kind regards Philip
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-03-31 Thread Paul Jackson
Denis wrote:
> This reminds me on VIA northbridge problem when BIOS enabled
> a feature which was experimental and turned out to be buggy.

You were close!

I changed my Memory Timing from 1T to 2T, and now it is as solid as a
rock.  It has been up 7 minutes as I type this, without a hiccup.

Notice this comment, at http://www.vr-zone.com.sg/?i=1641=1=0

Well as most Athlon 64 users know, 1T setting improves performance quite
significantly over 2T, but it is also very taxing on the memory and
quite a hit-and-miss when matching different memory with different
boards. From some users' feedback, the Asus A8N SLI can be a little
picky with 1T setting when overclocking, so results might be a little
better with other boards.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-03-31 Thread Paul Jackson
Denis wrote:
 This reminds me on VIA northbridge problem when BIOS enabled
 a feature which was experimental and turned out to be buggy.

You were close!

I changed my Memory Timing from 1T to 2T, and now it is as solid as a
rock.  It has been up 7 minutes as I type this, without a hiccup.

Notice this comment, at http://www.vr-zone.com.sg/?i=1641p=1s=0

Well as most Athlon 64 users know, 1T setting improves performance quite
significantly over 2T, but it is also very taxing on the memory and
quite a hit-and-miss when matching different memory with different
boards. From some users' feedback, the Asus A8N SLI can be a little
picky with 1T setting when overclocking, so results might be a little
better with other boards.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-03-31 Thread Philip Lawatsch
Paul Jackson wrote:
 Denis wrote:
 
This reminds me on VIA northbridge problem when BIOS enabled
a feature which was experimental and turned out to be buggy.
 
 
 You were close!
 
 I changed my Memory Timing from 1T to 2T, and now it is as solid as a
 rock.  It has been up 7 minutes as I type this, without a hiccup.
 
 Notice this comment, at http://www.vr-zone.com.sg/?i=1641p=1s=0
 
 Well as most Athlon 64 users know, 1T setting improves performance quite
 significantly over 2T, but it is also very taxing on the memory and
 quite a hit-and-miss when matching different memory with different
 boards. From some users' feedback, the Asus A8N SLI can be a little
 picky with 1T setting when overclocking, so results might be a little
 better with other boards.
 

I've now tried the most conservative settings available. The 32 bit
kernel now hangs after about 15 Iterations (compared to about 16000
before) but the 64 bit kernel still hangs after about 5000.

After a ~12 hour memtest86 run memtest86 crashed (!), filling the
console with some garbage characters and then hanging.

This is driving me crazy.

Imo memtest86 should not hang onless something screws up the memory area
it is loaded into.

I've also tried the newest beta bios for the board now, didnt change
anything.

kind regards Philip
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-03-31 Thread Paul Jackson
Your problem is almost certainly in the hardware area (cpu, bios,
memory, power, northbridge, motherboard, cooling or thereabouts).

 Imo memtest86 should not hang onless something screws up [its] memory area

There is nothing else running when memtest runs.  You cannot assume
that your hardware is operating like a sane digital computer when
memtest hangs - the magic of zero's, one's and instruction set
architectures is coming unglued and you are getting a glimpse of the
ugliness that is usually hidden behind the curtain.

Good luck fixing it.

LKML is probably not the place to continue to analyze this, now that
you've recreated it with memtest as well.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-03-31 Thread Mikael Pettersson
Paul Jackson writes:
  Yup - kills my x86_64 too.  I can't stay up for half a minute.
...
  My mainboard is an MSI K8N Neo2 Platinum.

I've tested both versions of the test program on two Athlon64 boxes,
and neither has had any problems with them.

My two machines are both VIA K8T800-based (a desktop and a laptop),
but it seems those of you who had problems have nForce-based machines.
So presumably it's either the nForce chipset or your memory timings are
out of spec.

/Mikael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-03-31 Thread Stelian Pop
On Thu, Mar 31, 2005 at 12:04:59AM +0200, Philip Lawatsch wrote:

 I do have a very strange problem:
 
 If I memset a ~1meg buffer some thousand times (in the userspace) it
 will hardlock my machine.
 
 I've been using 2.6.12-rc1 and also a lot of other kernels (2.6.9,
 2.6.11). I've tried it both using a 32 bit kernel and a 64 bit kernel.
 When running on the 32 bit kernel the machine hardlocks after about
 15000 iterations, on a 64 bit kernel the machine hardlocks after about
 5000 (the 64 bit system has nearly no background jobs running).
 
 I've been running memcheck for several hours now but nothing did show up.
 
 
 I've got an Asus A8N-SLI board with 2 gigs of memory and an AMD 3500+ CPU.
 
 The 64 bit kernel was compiled using gcc 3.4.3 and the 32 bit kernel
 using 3.3.5.
[...]

 powernow-k8: Found 1 AMD Athlon 64 / Opteron processors (version 1.00.09e)
 powernow-k8:0 : fid 0xe (2200 MHz), vid 0x6 (1400 mV)
 powernow-k8:1 : fid 0xc (2000 MHz), vid 0x8 (1350 mV)
 powernow-k8:2 : fid 0xa (1800 MHz), vid 0xa (1300 mV)
 powernow-k8:3 : fid 0x2 (1000 MHz), vid 0x12 (1100 mV)
 cpu_init done, current fid 0xe, vid 0x6

Just a thought: does deactivating cpufreq change anything ?

I haven't tested yet your program, but on my Asus K8NE-Deluxe very
strange things happen if cpufreq/powernow is activated *and* 
the cpu frequency is changed...

Stelian.
-- 
Stelian Pop [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-03-31 Thread Paul Jackson
 your memory timings are out of spec.

I don't know what spec applies here, don't really care.
But when I backed off my Memory Timing from 1T to 2T,
my box became stable running this memset() test.

So I am a happy camper, grateful that someone posted
this nice test, and agree with you that it was a memory
timing issue, at least for my system.

Apparently Philip's box has additional issues.  Whatever.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-03-31 Thread Robert Hancock
Philip Lawatsch wrote:
I've now tried the most conservative settings available. The 32 bit
kernel now hangs after about 15 Iterations (compared to about 16000
before) but the 64 bit kernel still hangs after about 5000.
I'm still seeing this on my system as well, using the most conservative 
timings possible (DDR200, all delay parameters except the refresh time 
set to the largest possible value) as well as DDR333 with the same 
timings and DDR400 with everything set to auto. I also tried the kernel 
on the Fedora Core 3 rescue disc (same crash) and in single user mode 
(same crash).

So far, the crashes have consisted of either a hang, reboot or panic. 
One panic was a spinlock already locked at kernel/module.c:2022 error. 
The other one is below, for what it's worth:

Mar 31 18:55:43 Newcastle kernel: Unable to handle kernel paging request 
at 8100588f5000 RIP:
Mar 31 18:55:43 Newcastle kernel: 80236ac7{clear_page+7}
Mar 31 18:55:43 Newcastle kernel: PGD 8063 PUD a063 PMD 0
Mar 31 18:55:43 Newcastle kernel: Oops: 0002 [1]
Mar 31 18:55:43 Newcastle kernel: CPU 0
Mar 31 18:55:43 Newcastle kernel: Modules linked in: md5(U) ipv6(U) 
parport_pc(U) lp(U) parport(U) autofs4(U) it87(U) i2c_sensor(U) 
i2c_isa(U) i2c_dev(U) i2c_core(U) sunrpc(U) pcmcia(U) yenta_socket(U) 
rsrc_nonstatic(U) pcmcia_core(U) joydev(U) nls_utf8(U) ntfs(U) vfat(U) 
fat(U) dm_mod(U) video(U) button(U) battery(U) ac(U) usb_storage(U) 
ohci1394(U) ieee1394(U) ohci_hcd(U) ehci_hcd(U) snd_ice1724(U) 
snd_ice17xx_ak4xxx(U) snd_ac97_codec(U) snd_pcm_oss(U) snd_mixer_oss(U) 
snd_pcm(U) snd_timer(U) snd_page_alloc(U) snd_ak4xxx_adda(U) 
snd_mpu401_uart(U) snd_rawmidi(U) snd_seq_device(U) snd(U) soundcore(U) 
forcedeth(U) floppy(U) ext3(U) jbd(U) sata_nv(U) libata(U) sd_mod(U) 
scsi_mod(U)
Mar 31 18:55:43 Newcastle kernel: Pid: 4928, comm: crashtest Not tainted 
2.6.11-1.7_FC3custom
Mar 31 18:55:43 Newcastle kernel: RIP: 0010:[80236ac7] 
80236ac7{clear_page+7}
Mar 31 18:55:43 Newcastle kernel: RSP: :810078299ca0  EFLAGS: 
00010246
Mar 31 18:55:43 Newcastle kernel: RAX:  RBX: 
0001 RCX: 0200
Mar 31 18:55:43 Newcastle kernel: RDX: 80478940 RSI: 
 RDI: 8100588f5000
Mar 31 18:55:43 Newcastle kernel: RBP: 81000235f5d0 R08: 
 R09: 
Mar 31 18:55:43 Newcastle kernel: R10: 000552fa R11: 
 R12: 8100
Mar 31 18:55:43 Newcastle kernel: R13: 81000235f598 R14: 
6db6db6db6db6db7 R15: 
Mar 31 18:55:43 Newcastle kernel: FS:  2aabeb00() 
GS:80552300() knlGS:
Mar 31 18:55:43 Newcastle kernel: CS:  0010 DS:  ES:  CR0: 
8005003b
Mar 31 18:55:43 Newcastle kernel: CR2: 8100588f5000 CR3: 
791b CR4: 06e0
Mar 31 18:55:43 Newcastle kernel: Process crashtest (pid: 4928, 
threadinfo 810078298000, task 8100788d67e0)
Mar 31 18:55:43 Newcastle kernel: Stack: 80170bc2 
0019 0286 
Mar 31 18:55:43 Newcastle kernel:000a 
80d2000a 0286 0256
Mar 31 18:55:43 Newcastle kernel:80478bc0 
Mar 31 18:55:43 Newcastle kernel: Call 
Trace:80170bc2{buffered_rmqueue+1154} 
80170dac{__alloc_pages+220}
Mar 31 18:55:43 Newcastle kernel: 
80181c52{do_no_page+370} 801825c0{handle_mm_fault+560}
Mar 31 18:55:43 Newcastle kernel: 
80284f9c{write_chan+860} 80123834{do_page_fault+1044}
Mar 31 18:55:43 Newcastle kernel: 
803a3699{thread_return+41} 8010f58d{error_exit+0}
Mar 31 18:55:43 Newcastle kernel:
Mar 31 18:55:43 Newcastle kernel:
Mar 31 18:55:43 Newcastle kernel: Code: f3 48 ab c3 66 66 66 90 66 66 66 
90 66 66 66 90 66 66 66 90
Mar 31 18:55:43 Newcastle kernel: RIP 80236ac7{clear_page+7} 
RSP 810078299ca0
Mar 31 18:55:43 Newcastle kernel: CR2: 8100588f5000

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-03-31 Thread Robert Hancock
Stelian Pop wrote:
Just a thought: does deactivating cpufreq change anything ?
I haven't tested yet your program, but on my Asus K8NE-Deluxe very
strange things happen if cpufreq/powernow is activated *and* 
the cpu frequency is changed...
Didn't change anything for me, I tried deactivating cpufreq, still 
crashes when I run that test program.

This is getting pretty ridiculous.. I've tried memory timings down to 
the slowest possible, ran Memtest86 for 4 passes with no errors, and 
it's been stable in Windows for a few months now. Still something is 
blowing up in Linux with this test though..

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-03-30 Thread Paul Jackson
Yup - kills my x86_64 too.  I can't stay up for half a minute.
I got a couple of Oops

  Unable to handle kernel paging request at 2730 RIP:

  Unable to handle kernel paging request at 81773ffc6918 RIP:

The first try ended with a sudden reboot.  The second time, I ctrl-C'd
out while I still had a responsive system.

I thought it might be a CPU temperature issue, so downloaded XMBmon
"Mother Board Monitor Program for X Window System", and hacked the
command line mbmon in it to add this memset loop and report the CPU temp
each time around the loop.

My CPU Temp went from its usual 39 C idle, to 45 C during the memset
loop, which are typical temperatures for this PC.  No problem there.

In a couple more tries, I got:
  knotify killed with a SIGSEGV
  artsd killed with a SIGSEGV
  a hard lockup, requiring the big red button
  a second oops at the same 81773ffc6918 as above.

My CPU, from /proc/cpuinfo, is:
model name  : AMD Athlon(tm) 64 Processor 3500+

My mainboard is an MSI K8N Neo2 Platinum.  I have 1 GByte of
Corsair XMS DDR400 memory.

I am not overclocking and I am running with standard voltages.

This is on a 2.6.11-rc5 kernel, though I doubt that matters.
I'm guessing it's hardware.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-03-30 Thread Denis Vlasenko
On Thursday 31 March 2005 07:38, Robert Hancock wrote:
> Philip Lawatsch wrote:
> > Hi,
> > 
> > 
> > I do have a very strange problem:
> > 
> > If I memset a ~1meg buffer some thousand times (in the userspace) it
> > will hardlock my machine.
> 
> I thought that this must be impossible, but I tried it on my machine 
> which is very similar (Asus A8N-SLI, Athlon 64 3500+, 2GB RAM) and to my 
> surprise it breaks on mine too with kernel 2.6.11. I tested using the 
> program below. After about a minute or so of this, the machine either 
> locked hard or rebooted spontaneously. When it locked, there was no oops 
> message, the NMI watchdog was not triggered and there was no response to 
>   SysRq commands. (I tested it with and without the NVIDIA module loaded.)
> 
> This seems pretty terrible, a perfectly legal program running as a 
> normal user is hard-locking the machine. Anyone have any suggestions to 
> debug this? Also, can somebody else on an x86_64 try and duplicate this?
> 
> #include 
> #include 
> #include 
> 
> int main( int argc, char* argv[] )
> {
>   char* test = malloc(512*1024*1024);
>   int i;
>   for( i=0; i<100; i++ )
>   {
>   memset( test, 0, 512*1024*1024);
>   }
>   free(test);
>   return 0;
> }

This reminds me on VIA northbridge problem when BIOS enabled
a feature which was experimental and turned out to be buggy.
Was causing oopses ONLY on K7 optimized kernels because
of movntq stores used. They seem to put an awful lot of writes
on the bus.
--
vda

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-03-30 Thread Robert Hancock
Philip Lawatsch wrote:
Hi,
I do have a very strange problem:
If I memset a ~1meg buffer some thousand times (in the userspace) it
will hardlock my machine.
I thought that this must be impossible, but I tried it on my machine 
which is very similar (Asus A8N-SLI, Athlon 64 3500+, 2GB RAM) and to my 
surprise it breaks on mine too with kernel 2.6.11. I tested using the 
program below. After about a minute or so of this, the machine either 
locked hard or rebooted spontaneously. When it locked, there was no oops 
message, the NMI watchdog was not triggered and there was no response to 
 SysRq commands. (I tested it with and without the NVIDIA module loaded.)

This seems pretty terrible, a perfectly legal program running as a 
normal user is hard-locking the machine. Anyone have any suggestions to 
debug this? Also, can somebody else on an x86_64 try and duplicate this?

#include 
#include 
#include 
int main( int argc, char* argv[] )
{
char* test = malloc(512*1024*1024);
int i;
for( i=0; i<100; i++ )
{
memset( test, 0, 512*1024*1024);
}
free(test);
return 0;
}
Bootdata ok (command line is ro root=LABEL=/)
Linux version 2.6.11-1.7_FC3custom ([EMAIL PROTECTED]) (gcc version 3.4.2 
20041017 (Red Hat 3.4.2-6.fc3)) #1 Thu Mar 24 21:23:17 CST 2005
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009f800 (usable)
 BIOS-e820: 0009f800 - 000a (reserved)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 7fff (usable)
 BIOS-e820: 7fff - 7fff3000 (ACPI NVS)
 BIOS-e820: 7fff3000 - 8000 (ACPI data)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - fec01000 (reserved)
 BIOS-e820: fee0 - fef0 (reserved)
 BIOS-e820: fefffc00 - ff00 (reserved)
 BIOS-e820:  - 0001 (reserved)
ACPI: RSDP (v000 Nvidia) @ 
0x000f7d50
ACPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 
0x7fff3040
ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 
0x7fff30c0
ACPI: MCFG (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 
0x7fff9640
ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 
0x7fff9580
ACPI: DSDT (v001 NVIDIA AWRDACPI 0x1000 MSFT 0x010e) @ 
0x
On node 0 totalpages: 524272
  DMA zone: 4096 pages, LIFO batch:1
  Normal zone: 520176 pages, LIFO batch:16
  HighMem zone: 0 pages, LIFO batch:1
Nvidia board detected. Ignoring ACPI timer override.
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:15 APIC version 16
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, version 17, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: BIOS IRQ0 pin2 override ignored.
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
ACPI: IRQ9 used by override.
ACPI: IRQ14 used by override.
ACPI: IRQ15 used by override.
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
Checking aperture...
CPU 0: aperture @ 132000 size 32 MB
Aperture from northbridge cpu 0 too small (32 MB)
No AGP bridge found
Built 1 zonelists
Kernel command line: ro root=LABEL=/ console=tty0
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 1.193182 MHz PIT timer.
time.c: Detected 2211.365 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Memory: 2055568k/2097088k available (2722k kernel code, 40732k reserved, 
1239k data, 188k init)
Calibrating delay loop... 4374.52 BogoMIPS (lpj=2187264)
Security Framework v1.0.0 initialized
SELinux:  Initializing.
SELinux:  Starting in permissive mode
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 256 (order: 0, 4096 bytes)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU: AMD Athlon(tm) 64 Processor 3500+ stepping 00
Using local APIC NMI watchdog using perfctr0
Using local APIC timer interrupts.
Detected 12.564 MHz APIC timer.
checking if image is initramfs... it is
NET: Registered protocol family 16
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20050211
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
PCI: Transparent bridge - :00:09.0
ACPI: PCI 

Re: AMD64 Machine hardlocks when using memset

2005-03-30 Thread Robert Hancock
Matthias-Christian Ott wrote:
You want to allocate a lot of memory (16 GB), you don't have that much 
space, so the Kernel hangs.
No, this is not what it is doing. The program is simply wiping the same 
1MB block of memory over and over. If it was doing what you say it would 
not (or should not) lock the machine anyway.

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-03-30 Thread Matthias-Christian Ott
Philip Lawatsch schrieb:
Hi,
I do have a very strange problem:
If I memset a ~1meg buffer some thousand times (in the userspace) it
will hardlock my machine.
I've been using 2.6.12-rc1 and also a lot of other kernels (2.6.9,
2.6.11). I've tried it both using a 32 bit kernel and a 64 bit kernel.
When running on the 32 bit kernel the machine hardlocks after about
15000 iterations, on a 64 bit kernel the machine hardlocks after about
5000 (the 64 bit system has nearly no background jobs running).
I've been running memcheck for several hours now but nothing did show up.
I've got an Asus A8N-SLI board with 2 gigs of memory and an AMD 3500+ CPU.
The 64 bit kernel was compiled using gcc 3.4.3 and the 32 bit kernel
using 3.3.5.
This simple programm will kill my machine:
#include 
#include 
int main(int argc, char *argv[])
{
   char buf[1024*1024];
   int i;
   for (i=0;i<1024*16;++i)
   {
   printf("%d\n",i);
   memset(buf,0,1024*1024);
   }
   printf("Done\n");
return 0;
}
If I usleep for 1ms after each memset the whole thing will happily run
forever without any problems.
Also if I start it twice (without sleeping in the loop) the machine wont
hardlock either (tested with a 32 bit kernel).
I'd really appreciate any pointers as to what might be wrong here.
I've tried both kernels with and without preemption.
kind regards Philip
 


 

Bootdata ok (command line is BOOT_IMAGE=test ro root=809)
   

Linux version 2.6.12-rc1 ([EMAIL PROTECTED]) (gcc version 3.4.3 20041125 (Gentoo Linux 3.4.3-r1, ssp-3.4.3-0, pie-8.7.7)) #1 Wed Mar 30 23:30:20 CEST 2005
BIOS-provided physical RAM map:
BIOS-e820:  - 0009f800 (usable)
BIOS-e820: 0009f800 - 000a (reserved)
BIOS-e820: 000f - 0010 (reserved)
BIOS-e820: 0010 - 7fff (usable)
BIOS-e820: 7fff - 7fff3000 (ACPI NVS)
BIOS-e820: 7fff3000 - 8000 (ACPI data)
BIOS-e820: e000 - f000 (reserved)
BIOS-e820: fec0 - fec01000 (reserved)
BIOS-e820: fee0 - fef0 (reserved)
BIOS-e820: fefffc00 - ff00 (reserved)
BIOS-e820:  - 0001 (reserved)
ACPI: RSDP (v000 Nvidia) @ 0x000f78c0
ACPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff3040
ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff30c0
ACPI: MCFG (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff9540
ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff9480
ACPI: DSDT (v001 NVIDIA AWRDACPI 0x1000 MSFT 0x010e) @ 0x
On node 0 totalpages: 524272
 DMA zone: 4096 pages, LIFO batch:1
 Normal zone: 520176 pages, LIFO batch:16
 HighMem zone: 0 pages, LIFO batch:1
Nvidia board detected. Ignoring ACPI timer override.
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:15 APIC version 16
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, version 17, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: BIOS IRQ0 pin2 override ignored.
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
ACPI: IRQ9 used by override.
ACPI: IRQ14 used by override.
ACPI: IRQ15 used by override.
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
Built 1 zonelists
Kernel command line: BOOT_IMAGE=test ro root=809 console=tty0
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 1.193182 MHz PIT timer.
time.c: Detected 2211.376 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Memory: 2056168k/2097088k available (3281k kernel code, 40236k reserved, 1386k data, 188k init)
Calibrating delay loop... 4374.52 BogoMIPS (lpj=2187264)
Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU: AMD Athlon(tm) 64 Processor 3500+ stepping 00
Using local APIC NMI watchdog using perfctr0
Using local APIC timer interrupts.
Detected 12.564 MHz APIC timer.
NET: Registered protocol family 16
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20050211
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
PCI: Transparent bridge - :00:09.0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI 

AMD64 Machine hardlocks when using memset

2005-03-30 Thread Philip Lawatsch
Hi,


I do have a very strange problem:

If I memset a ~1meg buffer some thousand times (in the userspace) it
will hardlock my machine.

I've been using 2.6.12-rc1 and also a lot of other kernels (2.6.9,
2.6.11). I've tried it both using a 32 bit kernel and a 64 bit kernel.
When running on the 32 bit kernel the machine hardlocks after about
15000 iterations, on a 64 bit kernel the machine hardlocks after about
5000 (the 64 bit system has nearly no background jobs running).

I've been running memcheck for several hours now but nothing did show up.


I've got an Asus A8N-SLI board with 2 gigs of memory and an AMD 3500+ CPU.

The 64 bit kernel was compiled using gcc 3.4.3 and the 32 bit kernel
using 3.3.5.


This simple programm will kill my machine:

#include 
#include 
int main(int argc, char *argv[])
{
char buf[1024*1024];
int i;
for (i=0;i<1024*16;++i)
{
printf("%d\n",i);
memset(buf,0,1024*1024);
}
printf("Done\n");
return 0;
}

If I usleep for 1ms after each memset the whole thing will happily run
forever without any problems.

Also if I start it twice (without sleeping in the loop) the machine wont
hardlock either (tested with a 32 bit kernel).

I'd really appreciate any pointers as to what might be wrong here.

I've tried both kernels with and without preemption.

kind regards Philip


>Bootdata ok (command line is BOOT_IMAGE=test ro root=809)
Linux version 2.6.12-rc1 ([EMAIL PROTECTED]) (gcc version 3.4.3 20041125 
(Gentoo Linux 3.4.3-r1, ssp-3.4.3-0, pie-8.7.7)) #1 Wed Mar 30 23:30:20 CEST 
2005
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009f800 (usable)
 BIOS-e820: 0009f800 - 000a (reserved)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 7fff (usable)
 BIOS-e820: 7fff - 7fff3000 (ACPI NVS)
 BIOS-e820: 7fff3000 - 8000 (ACPI data)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - fec01000 (reserved)
 BIOS-e820: fee0 - fef0 (reserved)
 BIOS-e820: fefffc00 - ff00 (reserved)
 BIOS-e820:  - 0001 (reserved)
ACPI: RSDP (v000 Nvidia) @ 0x000f78c0
ACPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 
0x7fff3040
ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 
0x7fff30c0
ACPI: MCFG (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 
0x7fff9540
ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 
0x7fff9480
ACPI: DSDT (v001 NVIDIA AWRDACPI 0x1000 MSFT 0x010e) @ 
0x
On node 0 totalpages: 524272
  DMA zone: 4096 pages, LIFO batch:1
  Normal zone: 520176 pages, LIFO batch:16
  HighMem zone: 0 pages, LIFO batch:1
Nvidia board detected. Ignoring ACPI timer override.
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:15 APIC version 16
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, version 17, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: BIOS IRQ0 pin2 override ignored.
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
ACPI: IRQ9 used by override.
ACPI: IRQ14 used by override.
ACPI: IRQ15 used by override.
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
Built 1 zonelists
Kernel command line: BOOT_IMAGE=test ro root=809 console=tty0
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 1.193182 MHz PIT timer.
time.c: Detected 2211.376 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Memory: 2056168k/2097088k available (3281k kernel code, 40236k reserved, 1386k 
data, 188k init)
Calibrating delay loop... 4374.52 BogoMIPS (lpj=2187264)
Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU: AMD Athlon(tm) 64 Processor 3500+ stepping 00
Using local APIC NMI watchdog using perfctr0
Using local APIC timer interrupts.
Detected 12.564 MHz APIC timer.
NET: Registered protocol family 16
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20050211
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
PCI: Transparent bridge - :00:09.0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.HUB0._PRT]
ACPI: PCI 

AMD64 Machine hardlocks when using memset

2005-03-30 Thread Philip Lawatsch
Hi,


I do have a very strange problem:

If I memset a ~1meg buffer some thousand times (in the userspace) it
will hardlock my machine.

I've been using 2.6.12-rc1 and also a lot of other kernels (2.6.9,
2.6.11). I've tried it both using a 32 bit kernel and a 64 bit kernel.
When running on the 32 bit kernel the machine hardlocks after about
15000 iterations, on a 64 bit kernel the machine hardlocks after about
5000 (the 64 bit system has nearly no background jobs running).

I've been running memcheck for several hours now but nothing did show up.


I've got an Asus A8N-SLI board with 2 gigs of memory and an AMD 3500+ CPU.

The 64 bit kernel was compiled using gcc 3.4.3 and the 32 bit kernel
using 3.3.5.


This simple programm will kill my machine:

#include stdlib.h
#include stdio.h
int main(int argc, char *argv[])
{
char buf[1024*1024];
int i;
for (i=0;i1024*16;++i)
{
printf(%d\n,i);
memset(buf,0,1024*1024);
}
printf(Done\n);
return 0;
}

If I usleep for 1ms after each memset the whole thing will happily run
forever without any problems.

Also if I start it twice (without sleeping in the loop) the machine wont
hardlock either (tested with a 32 bit kernel).

I'd really appreciate any pointers as to what might be wrong here.

I've tried both kernels with and without preemption.

kind regards Philip


Bootdata ok (command line is BOOT_IMAGE=test ro root=809)
Linux version 2.6.12-rc1 ([EMAIL PROTECTED]) (gcc version 3.4.3 20041125 
(Gentoo Linux 3.4.3-r1, ssp-3.4.3-0, pie-8.7.7)) #1 Wed Mar 30 23:30:20 CEST 
2005
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009f800 (usable)
 BIOS-e820: 0009f800 - 000a (reserved)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 7fff (usable)
 BIOS-e820: 7fff - 7fff3000 (ACPI NVS)
 BIOS-e820: 7fff3000 - 8000 (ACPI data)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - fec01000 (reserved)
 BIOS-e820: fee0 - fef0 (reserved)
 BIOS-e820: fefffc00 - ff00 (reserved)
 BIOS-e820:  - 0001 (reserved)
ACPI: RSDP (v000 Nvidia) @ 0x000f78c0
ACPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 
0x7fff3040
ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 
0x7fff30c0
ACPI: MCFG (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 
0x7fff9540
ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 
0x7fff9480
ACPI: DSDT (v001 NVIDIA AWRDACPI 0x1000 MSFT 0x010e) @ 
0x
On node 0 totalpages: 524272
  DMA zone: 4096 pages, LIFO batch:1
  Normal zone: 520176 pages, LIFO batch:16
  HighMem zone: 0 pages, LIFO batch:1
Nvidia board detected. Ignoring ACPI timer override.
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:15 APIC version 16
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, version 17, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: BIOS IRQ0 pin2 override ignored.
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
ACPI: IRQ9 used by override.
ACPI: IRQ14 used by override.
ACPI: IRQ15 used by override.
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
Built 1 zonelists
Kernel command line: BOOT_IMAGE=test ro root=809 console=tty0
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 1.193182 MHz PIT timer.
time.c: Detected 2211.376 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Memory: 2056168k/2097088k available (3281k kernel code, 40236k reserved, 1386k 
data, 188k init)
Calibrating delay loop... 4374.52 BogoMIPS (lpj=2187264)
Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU: AMD Athlon(tm) 64 Processor 3500+ stepping 00
Using local APIC NMI watchdog using perfctr0
Using local APIC timer interrupts.
Detected 12.564 MHz APIC timer.
NET: Registered protocol family 16
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20050211
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
PCI: Transparent bridge - :00:09.0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.HUB0._PRT]
ACPI: 

Re: AMD64 Machine hardlocks when using memset

2005-03-30 Thread Matthias-Christian Ott
Philip Lawatsch schrieb:
Hi,
I do have a very strange problem:
If I memset a ~1meg buffer some thousand times (in the userspace) it
will hardlock my machine.
I've been using 2.6.12-rc1 and also a lot of other kernels (2.6.9,
2.6.11). I've tried it both using a 32 bit kernel and a 64 bit kernel.
When running on the 32 bit kernel the machine hardlocks after about
15000 iterations, on a 64 bit kernel the machine hardlocks after about
5000 (the 64 bit system has nearly no background jobs running).
I've been running memcheck for several hours now but nothing did show up.
I've got an Asus A8N-SLI board with 2 gigs of memory and an AMD 3500+ CPU.
The 64 bit kernel was compiled using gcc 3.4.3 and the 32 bit kernel
using 3.3.5.
This simple programm will kill my machine:
#include stdlib.h
#include stdio.h
int main(int argc, char *argv[])
{
   char buf[1024*1024];
   int i;
   for (i=0;i1024*16;++i)
   {
   printf(%d\n,i);
   memset(buf,0,1024*1024);
   }
   printf(Done\n);
return 0;
}
If I usleep for 1ms after each memset the whole thing will happily run
forever without any problems.
Also if I start it twice (without sleeping in the loop) the machine wont
hardlock either (tested with a 32 bit kernel).
I'd really appreciate any pointers as to what might be wrong here.
I've tried both kernels with and without preemption.
kind regards Philip
 


 

Bootdata ok (command line is BOOT_IMAGE=test ro root=809)
   

Linux version 2.6.12-rc1 ([EMAIL PROTECTED]) (gcc version 3.4.3 20041125 (Gentoo Linux 3.4.3-r1, ssp-3.4.3-0, pie-8.7.7)) #1 Wed Mar 30 23:30:20 CEST 2005
BIOS-provided physical RAM map:
BIOS-e820:  - 0009f800 (usable)
BIOS-e820: 0009f800 - 000a (reserved)
BIOS-e820: 000f - 0010 (reserved)
BIOS-e820: 0010 - 7fff (usable)
BIOS-e820: 7fff - 7fff3000 (ACPI NVS)
BIOS-e820: 7fff3000 - 8000 (ACPI data)
BIOS-e820: e000 - f000 (reserved)
BIOS-e820: fec0 - fec01000 (reserved)
BIOS-e820: fee0 - fef0 (reserved)
BIOS-e820: fefffc00 - ff00 (reserved)
BIOS-e820:  - 0001 (reserved)
ACPI: RSDP (v000 Nvidia) @ 0x000f78c0
ACPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff3040
ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff30c0
ACPI: MCFG (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff9540
ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff9480
ACPI: DSDT (v001 NVIDIA AWRDACPI 0x1000 MSFT 0x010e) @ 0x
On node 0 totalpages: 524272
 DMA zone: 4096 pages, LIFO batch:1
 Normal zone: 520176 pages, LIFO batch:16
 HighMem zone: 0 pages, LIFO batch:1
Nvidia board detected. Ignoring ACPI timer override.
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:15 APIC version 16
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, version 17, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: BIOS IRQ0 pin2 override ignored.
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
ACPI: IRQ9 used by override.
ACPI: IRQ14 used by override.
ACPI: IRQ15 used by override.
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
Built 1 zonelists
Kernel command line: BOOT_IMAGE=test ro root=809 console=tty0
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 1.193182 MHz PIT timer.
time.c: Detected 2211.376 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Memory: 2056168k/2097088k available (3281k kernel code, 40236k reserved, 1386k data, 188k init)
Calibrating delay loop... 4374.52 BogoMIPS (lpj=2187264)
Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU: AMD Athlon(tm) 64 Processor 3500+ stepping 00
Using local APIC NMI watchdog using perfctr0
Using local APIC timer interrupts.
Detected 12.564 MHz APIC timer.
NET: Registered protocol family 16
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20050211
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
PCI: Transparent bridge - :00:09.0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]

Re: AMD64 Machine hardlocks when using memset

2005-03-30 Thread Robert Hancock
Philip Lawatsch wrote:
Hi,
I do have a very strange problem:
If I memset a ~1meg buffer some thousand times (in the userspace) it
will hardlock my machine.
I thought that this must be impossible, but I tried it on my machine 
which is very similar (Asus A8N-SLI, Athlon 64 3500+, 2GB RAM) and to my 
surprise it breaks on mine too with kernel 2.6.11. I tested using the 
program below. After about a minute or so of this, the machine either 
locked hard or rebooted spontaneously. When it locked, there was no oops 
message, the NMI watchdog was not triggered and there was no response to 
 SysRq commands. (I tested it with and without the NVIDIA module loaded.)

This seems pretty terrible, a perfectly legal program running as a 
normal user is hard-locking the machine. Anyone have any suggestions to 
debug this? Also, can somebody else on an x86_64 try and duplicate this?

#include stdio.h
#include stdlib.h
#include string.h
int main( int argc, char* argv[] )
{
char* test = malloc(512*1024*1024);
int i;
for( i=0; i100; i++ )
{
memset( test, 0, 512*1024*1024);
}
free(test);
return 0;
}
Bootdata ok (command line is ro root=LABEL=/)
Linux version 2.6.11-1.7_FC3custom ([EMAIL PROTECTED]) (gcc version 3.4.2 
20041017 (Red Hat 3.4.2-6.fc3)) #1 Thu Mar 24 21:23:17 CST 2005
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009f800 (usable)
 BIOS-e820: 0009f800 - 000a (reserved)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 7fff (usable)
 BIOS-e820: 7fff - 7fff3000 (ACPI NVS)
 BIOS-e820: 7fff3000 - 8000 (ACPI data)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - fec01000 (reserved)
 BIOS-e820: fee0 - fef0 (reserved)
 BIOS-e820: fefffc00 - ff00 (reserved)
 BIOS-e820:  - 0001 (reserved)
ACPI: RSDP (v000 Nvidia) @ 
0x000f7d50
ACPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 
0x7fff3040
ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 
0x7fff30c0
ACPI: MCFG (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 
0x7fff9640
ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 
0x7fff9580
ACPI: DSDT (v001 NVIDIA AWRDACPI 0x1000 MSFT 0x010e) @ 
0x
On node 0 totalpages: 524272
  DMA zone: 4096 pages, LIFO batch:1
  Normal zone: 520176 pages, LIFO batch:16
  HighMem zone: 0 pages, LIFO batch:1
Nvidia board detected. Ignoring ACPI timer override.
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:15 APIC version 16
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, version 17, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: BIOS IRQ0 pin2 override ignored.
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
ACPI: IRQ9 used by override.
ACPI: IRQ14 used by override.
ACPI: IRQ15 used by override.
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
Checking aperture...
CPU 0: aperture @ 132000 size 32 MB
Aperture from northbridge cpu 0 too small (32 MB)
No AGP bridge found
Built 1 zonelists
Kernel command line: ro root=LABEL=/ console=tty0
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 1.193182 MHz PIT timer.
time.c: Detected 2211.365 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Memory: 2055568k/2097088k available (2722k kernel code, 40732k reserved, 
1239k data, 188k init)
Calibrating delay loop... 4374.52 BogoMIPS (lpj=2187264)
Security Framework v1.0.0 initialized
SELinux:  Initializing.
SELinux:  Starting in permissive mode
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 256 (order: 0, 4096 bytes)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU: AMD Athlon(tm) 64 Processor 3500+ stepping 00
Using local APIC NMI watchdog using perfctr0
Using local APIC timer interrupts.
Detected 12.564 MHz APIC timer.
checking if image is initramfs... it is
NET: Registered protocol family 16
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20050211
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
PCI: Transparent bridge - 

Re: AMD64 Machine hardlocks when using memset

2005-03-30 Thread Denis Vlasenko
On Thursday 31 March 2005 07:38, Robert Hancock wrote:
 Philip Lawatsch wrote:
  Hi,
  
  
  I do have a very strange problem:
  
  If I memset a ~1meg buffer some thousand times (in the userspace) it
  will hardlock my machine.
 
 I thought that this must be impossible, but I tried it on my machine 
 which is very similar (Asus A8N-SLI, Athlon 64 3500+, 2GB RAM) and to my 
 surprise it breaks on mine too with kernel 2.6.11. I tested using the 
 program below. After about a minute or so of this, the machine either 
 locked hard or rebooted spontaneously. When it locked, there was no oops 
 message, the NMI watchdog was not triggered and there was no response to 
   SysRq commands. (I tested it with and without the NVIDIA module loaded.)
 
 This seems pretty terrible, a perfectly legal program running as a 
 normal user is hard-locking the machine. Anyone have any suggestions to 
 debug this? Also, can somebody else on an x86_64 try and duplicate this?
 
 #include stdio.h
 #include stdlib.h
 #include string.h
 
 int main( int argc, char* argv[] )
 {
   char* test = malloc(512*1024*1024);
   int i;
   for( i=0; i100; i++ )
   {
   memset( test, 0, 512*1024*1024);
   }
   free(test);
   return 0;
 }

This reminds me on VIA northbridge problem when BIOS enabled
a feature which was experimental and turned out to be buggy.
Was causing oopses ONLY on K7 optimized kernels because
of movntq stores used. They seem to put an awful lot of writes
on the bus.
--
vda

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: AMD64 Machine hardlocks when using memset

2005-03-30 Thread Paul Jackson
Yup - kills my x86_64 too.  I can't stay up for half a minute.
I got a couple of Oops

  Unable to handle kernel paging request at 2730 RIP:

  Unable to handle kernel paging request at 81773ffc6918 RIP:

The first try ended with a sudden reboot.  The second time, I ctrl-C'd
out while I still had a responsive system.

I thought it might be a CPU temperature issue, so downloaded XMBmon
Mother Board Monitor Program for X Window System, and hacked the
command line mbmon in it to add this memset loop and report the CPU temp
each time around the loop.

My CPU Temp went from its usual 39 C idle, to 45 C during the memset
loop, which are typical temperatures for this PC.  No problem there.

In a couple more tries, I got:
  knotify killed with a SIGSEGV
  artsd killed with a SIGSEGV
  a hard lockup, requiring the big red button
  a second oops at the same 81773ffc6918 as above.

My CPU, from /proc/cpuinfo, is:
model name  : AMD Athlon(tm) 64 Processor 3500+

My mainboard is an MSI K8N Neo2 Platinum.  I have 1 GByte of
Corsair XMS DDR400 memory.

I am not overclocking and I am running with standard voltages.

This is on a 2.6.11-rc5 kernel, though I doubt that matters.
I'm guessing it's hardware.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/