Re: AMD64 Machine hardlocks when using memset
Philip Lawatsch wrote: >>Anyone have any suggestions on how to track this further? It seems >>fairly clear what circumstances are causing it, but as for figuring out >>what's at fault.. > > It seems that mov'ing does not kill my machine while simply using movnti > does. Forget about what I just wrote, I've been able to reproduce this in 32bit mode too although it did take a long while to happen. And glibc in 32bit mode simply uses mov in a normal loop to write to the memory. Looks like using mov in 64bit mode polluted my cache and crippled performance (have been running some other programs in the background) and thus perhaps didnt trigger the problem. I'm going nuts with this. kind regards Philip - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
On Wed, 2005-04-06 at 12:59 +0200, Philip Lawatsch wrote: > Robert Hancock wrote: > > Alan Cox wrote: > > > >> On Sad, 2005-04-02 at 05:50, Robert Hancock wrote: > >> > >>> I'm wondering if one does a ton of these cache-bypassing stores > >>> whether something gets hosed because of that. Not sure what that > >>> could be though. I don't imagine the chipset is involved with any of > >>> that on the Athlon 64 - either the CPU or RAM seems the most likely > >>> suspect to me > >> > >> > >> > >> The glibc version is essentially the "perfect" copy function for the > >> CPU. If you have any bus/memory problems or chipset bugs it will bite > >> you. > > > > > > Anyone have any suggestions on how to track this further? It seems > > fairly clear what circumstances are causing it, but as for figuring out > > what's at fault.. > > Digging through my glibc's source if found that if you memset arrays > <12 bytes it will use good old mov instructions to do the job. In > case of arrays larger than 12 bytes it will use movnti instructions > to do the job. > > Thus I refined my test code to use mov for memset regardless of the size > (simply abused glibcs code a little bit) > > -> No crash! > > Then, changing the all the mov to movnti and my machine frags again :( > > It seems that mov'ing does not kill my machine while simply using movnti > does. movnti also gets a higher bandwidth so that doesn't rule out too much.. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Robert Hancock wrote: > Alan Cox wrote: > >> On Sad, 2005-04-02 at 05:50, Robert Hancock wrote: >> >>> I'm wondering if one does a ton of these cache-bypassing stores >>> whether something gets hosed because of that. Not sure what that >>> could be though. I don't imagine the chipset is involved with any of >>> that on the Athlon 64 - either the CPU or RAM seems the most likely >>> suspect to me >> >> >> >> The glibc version is essentially the "perfect" copy function for the >> CPU. If you have any bus/memory problems or chipset bugs it will bite >> you. > > > Anyone have any suggestions on how to track this further? It seems > fairly clear what circumstances are causing it, but as for figuring out > what's at fault.. Digging through my glibc's source if found that if you memset arrays <12 bytes it will use good old mov instructions to do the job. In case of arrays larger than 12 bytes it will use movnti instructions to do the job. Thus I refined my test code to use mov for memset regardless of the size (simply abused glibcs code a little bit) -> No crash! Then, changing the all the mov to movnti and my machine frags again :( It seems that mov'ing does not kill my machine while simply using movnti does. kind regards Philip - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Rafael J. Wysocki wrote: >>Anyone have any suggestions on how to track this further? It seems >>fairly clear what circumstances are causing it, but as for figuring out >>what's at fault.. > > > Well, I would start from changing memory modules. As I wrote earlier, I tried 4 different (but same brand) modules, 2 Infineon and 2 Samsung ones. No difference. Btw, I've been working (stressing) the machine for one week now and never had any problems, the system seems rock solid (until I start my memory stresser). kind regards Philip - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
[disregard my previous mail. I should have read the whole thread first] On Saturday 02 April 2005 07:50, Robert Hancock wrote: > As it turns out, the memset in my version of glibc x86_64 is not using > such a string instruction though - it seems to be using two different > sets of instructions depending on the size of the memset (not sure > exactly how they're calculating the threshold between these..) For sizes > below the treshold, this is the inner loop - it's using normal mov > instructions: > > 3:/* Copy 64 bytes. */ > mov %r8,(%rcx) > mov %r8,0x8(%rcx) > mov %r8,0x10(%rcx) > mov %r8,0x18(%rcx) > mov %r8,0x20(%rcx) > mov %r8,0x28(%rcx) > mov %r8,0x30(%rcx) > mov %r8,0x38(%rcx) > add $0x40,%rcx > dec %rax > jne 3b > > For sizes above the threshold though, this is the inner loop. It's using > movnti which is an SSE cache-bypasssing store: > > 11: /* Copy 64 bytes without polluting the cache. */ > /* We could use movntdq%xmm0,(%rcx) here to further > speed up for large cases but let's not use XMM registers. */ > movnti %r8,(%rcx) > movnti %r8,0x8(%rcx) > movnti %r8,0x10(%rcx) > movnti %r8,0x18(%rcx) > movnti %r8,0x20(%rcx) > movnti %r8,0x28(%rcx) > movnti %r8,0x30(%rcx) > movnti %r8,0x38(%rcx) > add $0x40,%rcx > dec %rax > jne 11b This is a very rarely used instruction. People either do plain old rep stosl or do 3DNOW or SSE2 non-temporal stores. Maybe movnti is different (buggy?) in subtle way. Does it blow up if you use 3DNOW or SSE2 non-temporal stores? If yes, then try different BIOS (not nesessarily latest is best). BTW, 'Athlon bug' was tracked down similarly. New BIOS enabled buggy chipset feature - BOOM! non-temporals killed the box (took several months to figure it out back then). -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Hi, On Wednesday, 6 of April 2005 06:05, Robert Hancock wrote: > Alan Cox wrote: > > On Sad, 2005-04-02 at 05:50, Robert Hancock wrote: > > > >>I'm wondering if one does a ton of these cache-bypassing stores whether > >>something gets hosed because of that. Not sure what that could be > >>though. I don't imagine the chipset is involved with any of that on the > >>Athlon 64 - either the CPU or RAM seems the most likely suspect to me > > > > > > The glibc version is essentially the "perfect" copy function for the > > CPU. If you have any bus/memory problems or chipset bugs it will bite > > you. > > Anyone have any suggestions on how to track this further? It seems > fairly clear what circumstances are causing it, but as for figuring out > what's at fault.. Well, I would start from changing memory modules. Greets, Rafael -- - Would you tell me, please, which way I ought to go from here? - That depends a good deal on where you want to get to. -- Lewis Carroll "Alice's Adventures in Wonderland" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Hi, On Wednesday, 6 of April 2005 06:05, Robert Hancock wrote: Alan Cox wrote: On Sad, 2005-04-02 at 05:50, Robert Hancock wrote: I'm wondering if one does a ton of these cache-bypassing stores whether something gets hosed because of that. Not sure what that could be though. I don't imagine the chipset is involved with any of that on the Athlon 64 - either the CPU or RAM seems the most likely suspect to me The glibc version is essentially the perfect copy function for the CPU. If you have any bus/memory problems or chipset bugs it will bite you. Anyone have any suggestions on how to track this further? It seems fairly clear what circumstances are causing it, but as for figuring out what's at fault.. Well, I would start from changing memory modules. Greets, Rafael -- - Would you tell me, please, which way I ought to go from here? - That depends a good deal on where you want to get to. -- Lewis Carroll Alice's Adventures in Wonderland - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
[disregard my previous mail. I should have read the whole thread first] On Saturday 02 April 2005 07:50, Robert Hancock wrote: As it turns out, the memset in my version of glibc x86_64 is not using such a string instruction though - it seems to be using two different sets of instructions depending on the size of the memset (not sure exactly how they're calculating the threshold between these..) For sizes below the treshold, this is the inner loop - it's using normal mov instructions: 3:/* Copy 64 bytes. */ mov %r8,(%rcx) mov %r8,0x8(%rcx) mov %r8,0x10(%rcx) mov %r8,0x18(%rcx) mov %r8,0x20(%rcx) mov %r8,0x28(%rcx) mov %r8,0x30(%rcx) mov %r8,0x38(%rcx) add $0x40,%rcx dec %rax jne 3b For sizes above the threshold though, this is the inner loop. It's using movnti which is an SSE cache-bypasssing store: 11: /* Copy 64 bytes without polluting the cache. */ /* We could use movntdq%xmm0,(%rcx) here to further speed up for large cases but let's not use XMM registers. */ movnti %r8,(%rcx) movnti %r8,0x8(%rcx) movnti %r8,0x10(%rcx) movnti %r8,0x18(%rcx) movnti %r8,0x20(%rcx) movnti %r8,0x28(%rcx) movnti %r8,0x30(%rcx) movnti %r8,0x38(%rcx) add $0x40,%rcx dec %rax jne 11b This is a very rarely used instruction. People either do plain old rep stosl or do 3DNOW or SSE2 non-temporal stores. Maybe movnti is different (buggy?) in subtle way. Does it blow up if you use 3DNOW or SSE2 non-temporal stores? If yes, then try different BIOS (not nesessarily latest is best). BTW, 'Athlon bug' was tracked down similarly. New BIOS enabled buggy chipset feature - BOOM! non-temporals killed the box (took several months to figure it out back then). -- vda - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Rafael J. Wysocki wrote: Anyone have any suggestions on how to track this further? It seems fairly clear what circumstances are causing it, but as for figuring out what's at fault.. Well, I would start from changing memory modules. As I wrote earlier, I tried 4 different (but same brand) modules, 2 Infineon and 2 Samsung ones. No difference. Btw, I've been working (stressing) the machine for one week now and never had any problems, the system seems rock solid (until I start my memory stresser). kind regards Philip - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Robert Hancock wrote: Alan Cox wrote: On Sad, 2005-04-02 at 05:50, Robert Hancock wrote: I'm wondering if one does a ton of these cache-bypassing stores whether something gets hosed because of that. Not sure what that could be though. I don't imagine the chipset is involved with any of that on the Athlon 64 - either the CPU or RAM seems the most likely suspect to me The glibc version is essentially the perfect copy function for the CPU. If you have any bus/memory problems or chipset bugs it will bite you. Anyone have any suggestions on how to track this further? It seems fairly clear what circumstances are causing it, but as for figuring out what's at fault.. Digging through my glibc's source if found that if you memset arrays 12 bytes it will use good old mov instructions to do the job. In case of arrays larger than 12 bytes it will use movnti instructions to do the job. Thus I refined my test code to use mov for memset regardless of the size (simply abused glibcs code a little bit) - No crash! Then, changing the all the mov to movnti and my machine frags again :( It seems that mov'ing does not kill my machine while simply using movnti does. kind regards Philip - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
On Wed, 2005-04-06 at 12:59 +0200, Philip Lawatsch wrote: Robert Hancock wrote: Alan Cox wrote: On Sad, 2005-04-02 at 05:50, Robert Hancock wrote: I'm wondering if one does a ton of these cache-bypassing stores whether something gets hosed because of that. Not sure what that could be though. I don't imagine the chipset is involved with any of that on the Athlon 64 - either the CPU or RAM seems the most likely suspect to me The glibc version is essentially the perfect copy function for the CPU. If you have any bus/memory problems or chipset bugs it will bite you. Anyone have any suggestions on how to track this further? It seems fairly clear what circumstances are causing it, but as for figuring out what's at fault.. Digging through my glibc's source if found that if you memset arrays 12 bytes it will use good old mov instructions to do the job. In case of arrays larger than 12 bytes it will use movnti instructions to do the job. Thus I refined my test code to use mov for memset regardless of the size (simply abused glibcs code a little bit) - No crash! Then, changing the all the mov to movnti and my machine frags again :( It seems that mov'ing does not kill my machine while simply using movnti does. movnti also gets a higher bandwidth so that doesn't rule out too much.. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Philip Lawatsch wrote: Anyone have any suggestions on how to track this further? It seems fairly clear what circumstances are causing it, but as for figuring out what's at fault.. It seems that mov'ing does not kill my machine while simply using movnti does. Forget about what I just wrote, I've been able to reproduce this in 32bit mode too although it did take a long while to happen. And glibc in 32bit mode simply uses mov in a normal loop to write to the memory. Looks like using mov in 64bit mode polluted my cache and crippled performance (have been running some other programs in the background) and thus perhaps didnt trigger the problem. I'm going nuts with this. kind regards Philip - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Alan Cox wrote: On Sad, 2005-04-02 at 05:50, Robert Hancock wrote: I'm wondering if one does a ton of these cache-bypassing stores whether something gets hosed because of that. Not sure what that could be though. I don't imagine the chipset is involved with any of that on the Athlon 64 - either the CPU or RAM seems the most likely suspect to me The glibc version is essentially the "perfect" copy function for the CPU. If you have any bus/memory problems or chipset bugs it will bite you. Anyone have any suggestions on how to track this further? It seems fairly clear what circumstances are causing it, but as for figuring out what's at fault.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Alan Cox wrote: On Sad, 2005-04-02 at 05:50, Robert Hancock wrote: I'm wondering if one does a ton of these cache-bypassing stores whether something gets hosed because of that. Not sure what that could be though. I don't imagine the chipset is involved with any of that on the Athlon 64 - either the CPU or RAM seems the most likely suspect to me The glibc version is essentially the perfect copy function for the CPU. If you have any bus/memory problems or chipset bugs it will bite you. Anyone have any suggestions on how to track this further? It seems fairly clear what circumstances are causing it, but as for figuring out what's at fault.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
On Sad, 2005-04-02 at 05:50, Robert Hancock wrote: > I'm wondering if one does a ton of these cache-bypassing stores whether > something gets hosed because of that. Not sure what that could be > though. I don't imagine the chipset is involved with any of that on the > Athlon 64 - either the CPU or RAM seems the most likely suspect to me The glibc version is essentially the "perfect" copy function for the CPU. If you have any bus/memory problems or chipset bugs it will bite you. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
On Sad, 2005-04-02 at 05:50, Robert Hancock wrote: I'm wondering if one does a ton of these cache-bypassing stores whether something gets hosed because of that. Not sure what that could be though. I don't imagine the chipset is involved with any of that on the Athlon 64 - either the CPU or RAM seems the most likely suspect to me The glibc version is essentially the perfect copy function for the CPU. If you have any bus/memory problems or chipset bugs it will bite you. Alan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Paul Jackson wrote: The x86_64 memset(), both in user space and the kernel, for whatever gcc I have, and for a current kernel, uses the "repz stos" or "rep stosq" prefixed instruction for the bulk of the copy. This combination is a long running, interruptible Intel string instruction that loops on itself until the CX register decrements to zero. Was your windows app using "stos"? I'll wager a nickel that the actual crash you see comes when the processor has to handle an interrupt while in the middle of this instruction. I'll wager a dime it's hardware, though interrupt activity may be required to provoke it. I ended up making a test program which essentially did the same thing except not using memset (just moving an int* up repeatedly and setting the value there to 0). That worked fine on both Windows and Linux. I then tried such a program using a long* compiled as 64-bit on Linux, that also worked fine. It seems like I can only reproduce it when memset is actually used.. I don't remember exactly what the Windows memset was using, that was on my work machine - it was inline assembly though, and I do know that it had only one instruction for the whole set, so it was likely "repz stos" or something similar to that. As it turns out, the memset in my version of glibc x86_64 is not using such a string instruction though - it seems to be using two different sets of instructions depending on the size of the memset (not sure exactly how they're calculating the threshold between these..) For sizes below the treshold, this is the inner loop - it's using normal mov instructions: 3: /* Copy 64 bytes. */ mov %r8,(%rcx) mov %r8,0x8(%rcx) mov %r8,0x10(%rcx) mov %r8,0x18(%rcx) mov %r8,0x20(%rcx) mov %r8,0x28(%rcx) mov %r8,0x30(%rcx) mov %r8,0x38(%rcx) add $0x40,%rcx dec %rax jne 3b For sizes above the threshold though, this is the inner loop. It's using movnti which is an SSE cache-bypasssing store: 11: /* Copy 64 bytes without polluting the cache. */ /* We could use movntdq%xmm0,(%rcx) here to further speed up for large cases but let's not use XMM registers. */ movnti %r8,(%rcx) movnti %r8,0x8(%rcx) movnti %r8,0x10(%rcx) movnti %r8,0x18(%rcx) movnti %r8,0x20(%rcx) movnti %r8,0x28(%rcx) movnti %r8,0x30(%rcx) movnti %r8,0x38(%rcx) add $0x40,%rcx dec %rax jne 11b I'm wondering if one does a ton of these cache-bypassing stores whether something gets hosed because of that. Not sure what that could be though. I don't imagine the chipset is involved with any of that on the Athlon 64 - either the CPU or RAM seems the most likely suspect to me -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Robert wrote: > It does run visibly slower The x86_64 memset(), both in user space and the kernel, for whatever gcc I have, and for a current kernel, uses the "repz stos" or "rep stosq" prefixed instruction for the bulk of the copy. This combination is a long running, interruptible Intel string instruction that loops on itself until the CX register decrements to zero. Was your windows app using "stos"? I'll wager a nickel that the actual crash you see comes when the processor has to handle an interrupt while in the middle of this instruction. I'll wager a dime it's hardware, though interrupt activity may be required to provoke it. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Ray Lee wrote: On Thu, 2005-03-31 at 22:37 -0600, Robert Hancock wrote: This is getting pretty ridiculous.. I've tried memory timings down to the slowest possible, ran Memtest86 for 4 passes with no errors, and it's been stable in Windows for a few months now. Still something is blowing up in Linux with this test though.. Have you run the same memset test under windows? I've traced a lot of oddball problems down to bad or marginal power supplies. I've now built a similar test program for Windows. I've let it run over 2000 iterations of 512MB memsets with no problems. On Linux it usually blew up with under 200 iterations. It does run visibly slower than the Linux version though - this is after all 32 bit Windows and it was compiled with crufty old Visual C++ 6.0 so it is probably not that optimized for this CPU. I will see if I can get a more optimized build of this to try in Mingw32 or something.. after all if it's related to some instruction combination or something it may not show up in the build I have. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Ray Lee wrote: > On Thu, 2005-03-31 at 22:37 -0600, Robert Hancock wrote: > >>This is getting pretty ridiculous.. I've tried memory timings down to >>the slowest possible, ran Memtest86 for 4 passes with no errors, and >>it's been stable in Windows for a few months now. Still something is >>blowing up in Linux with this test though.. > > > Have you run the same memset test under windows? > > I've traced a lot of oddball problems down to bad or marginal power > supplies. So far I've tried 2 PSUs and 3 different brands of memory. No differences. And due to a lack of windows I cant really test it. I'll try a different (not based on nforce 4) motherboard now. kind regards Philip - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
On Thu, 2005-03-31 at 22:37 -0600, Robert Hancock wrote: > This is getting pretty ridiculous.. I've tried memory timings down to > the slowest possible, ran Memtest86 for 4 passes with no errors, and > it's been stable in Windows for a few months now. Still something is > blowing up in Linux with this test though.. Have you run the same memset test under windows? I've traced a lot of oddball problems down to bad or marginal power supplies. Ray - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
On Friday 01 April 2005 07:37, Robert Hancock wrote: > Stelian Pop wrote: > > Just a thought: does deactivating cpufreq change anything ? > > > > I haven't tested yet your program, but on my Asus K8NE-Deluxe very > > strange things happen if cpufreq/powernow is activated *and* > > the cpu frequency is changed... > > Didn't change anything for me, I tried deactivating cpufreq, still > crashes when I run that test program. > > This is getting pretty ridiculous.. I've tried memory timings down to > the slowest possible, ran Memtest86 for 4 passes with no errors, and > it's been stable in Windows for a few months now. Still something is > blowing up in Linux with this test though.. If you want to dig deeper, go to assembler level. That is, instead of using memset(), disassemble your program and make your own void my_memset(...) { asm volatile(/* code sequence from your crashing prog*/); } and use that in your memsetting loop. Sure, it won't change anything, but: a) we will know exactly which instruction sequence drives your CPU/chipset crazy b) others can try to reproduce without danger of memset being implemented differently on their perticular version of gcc/glibc/whatever c) you can try other memsets in order to know more about this bug (for example, if inserting some NOPs in the my_memset body makes bug disappear will definitely point towards defective/ overheating CPU. etc...) -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
On Friday 01 April 2005 07:37, Robert Hancock wrote: Stelian Pop wrote: Just a thought: does deactivating cpufreq change anything ? I haven't tested yet your program, but on my Asus K8NE-Deluxe very strange things happen if cpufreq/powernow is activated *and* the cpu frequency is changed... Didn't change anything for me, I tried deactivating cpufreq, still crashes when I run that test program. This is getting pretty ridiculous.. I've tried memory timings down to the slowest possible, ran Memtest86 for 4 passes with no errors, and it's been stable in Windows for a few months now. Still something is blowing up in Linux with this test though.. If you want to dig deeper, go to assembler level. That is, instead of using memset(), disassemble your program and make your own void my_memset(...) { asm volatile(/* code sequence from your crashing prog*/); } and use that in your memsetting loop. Sure, it won't change anything, but: a) we will know exactly which instruction sequence drives your CPU/chipset crazy b) others can try to reproduce without danger of memset being implemented differently on their perticular version of gcc/glibc/whatever c) you can try other memsets in order to know more about this bug (for example, if inserting some NOPs in the my_memset body makes bug disappear will definitely point towards defective/ overheating CPU. etc...) -- vda - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
On Thu, 2005-03-31 at 22:37 -0600, Robert Hancock wrote: This is getting pretty ridiculous.. I've tried memory timings down to the slowest possible, ran Memtest86 for 4 passes with no errors, and it's been stable in Windows for a few months now. Still something is blowing up in Linux with this test though.. Have you run the same memset test under windows? I've traced a lot of oddball problems down to bad or marginal power supplies. Ray - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Ray Lee wrote: On Thu, 2005-03-31 at 22:37 -0600, Robert Hancock wrote: This is getting pretty ridiculous.. I've tried memory timings down to the slowest possible, ran Memtest86 for 4 passes with no errors, and it's been stable in Windows for a few months now. Still something is blowing up in Linux with this test though.. Have you run the same memset test under windows? I've traced a lot of oddball problems down to bad or marginal power supplies. So far I've tried 2 PSUs and 3 different brands of memory. No differences. And due to a lack of windows I cant really test it. I'll try a different (not based on nforce 4) motherboard now. kind regards Philip - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Ray Lee wrote: On Thu, 2005-03-31 at 22:37 -0600, Robert Hancock wrote: This is getting pretty ridiculous.. I've tried memory timings down to the slowest possible, ran Memtest86 for 4 passes with no errors, and it's been stable in Windows for a few months now. Still something is blowing up in Linux with this test though.. Have you run the same memset test under windows? I've traced a lot of oddball problems down to bad or marginal power supplies. I've now built a similar test program for Windows. I've let it run over 2000 iterations of 512MB memsets with no problems. On Linux it usually blew up with under 200 iterations. It does run visibly slower than the Linux version though - this is after all 32 bit Windows and it was compiled with crufty old Visual C++ 6.0 so it is probably not that optimized for this CPU. I will see if I can get a more optimized build of this to try in Mingw32 or something.. after all if it's related to some instruction combination or something it may not show up in the build I have. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Robert wrote: It does run visibly slower The x86_64 memset(), both in user space and the kernel, for whatever gcc I have, and for a current kernel, uses the repz stos or rep stosq prefixed instruction for the bulk of the copy. This combination is a long running, interruptible Intel string instruction that loops on itself until the CX register decrements to zero. Was your windows app using stos? I'll wager a nickel that the actual crash you see comes when the processor has to handle an interrupt while in the middle of this instruction. I'll wager a dime it's hardware, though interrupt activity may be required to provoke it. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Paul Jackson wrote: The x86_64 memset(), both in user space and the kernel, for whatever gcc I have, and for a current kernel, uses the repz stos or rep stosq prefixed instruction for the bulk of the copy. This combination is a long running, interruptible Intel string instruction that loops on itself until the CX register decrements to zero. Was your windows app using stos? I'll wager a nickel that the actual crash you see comes when the processor has to handle an interrupt while in the middle of this instruction. I'll wager a dime it's hardware, though interrupt activity may be required to provoke it. I ended up making a test program which essentially did the same thing except not using memset (just moving an int* up repeatedly and setting the value there to 0). That worked fine on both Windows and Linux. I then tried such a program using a long* compiled as 64-bit on Linux, that also worked fine. It seems like I can only reproduce it when memset is actually used.. I don't remember exactly what the Windows memset was using, that was on my work machine - it was inline assembly though, and I do know that it had only one instruction for the whole set, so it was likely repz stos or something similar to that. As it turns out, the memset in my version of glibc x86_64 is not using such a string instruction though - it seems to be using two different sets of instructions depending on the size of the memset (not sure exactly how they're calculating the threshold between these..) For sizes below the treshold, this is the inner loop - it's using normal mov instructions: 3: /* Copy 64 bytes. */ mov %r8,(%rcx) mov %r8,0x8(%rcx) mov %r8,0x10(%rcx) mov %r8,0x18(%rcx) mov %r8,0x20(%rcx) mov %r8,0x28(%rcx) mov %r8,0x30(%rcx) mov %r8,0x38(%rcx) add $0x40,%rcx dec %rax jne 3b For sizes above the threshold though, this is the inner loop. It's using movnti which is an SSE cache-bypasssing store: 11: /* Copy 64 bytes without polluting the cache. */ /* We could use movntdq%xmm0,(%rcx) here to further speed up for large cases but let's not use XMM registers. */ movnti %r8,(%rcx) movnti %r8,0x8(%rcx) movnti %r8,0x10(%rcx) movnti %r8,0x18(%rcx) movnti %r8,0x20(%rcx) movnti %r8,0x28(%rcx) movnti %r8,0x30(%rcx) movnti %r8,0x38(%rcx) add $0x40,%rcx dec %rax jne 11b I'm wondering if one does a ton of these cache-bypassing stores whether something gets hosed because of that. Not sure what that could be though. I don't imagine the chipset is involved with any of that on the Athlon 64 - either the CPU or RAM seems the most likely suspect to me -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Stelian Pop wrote: Just a thought: does deactivating cpufreq change anything ? I haven't tested yet your program, but on my Asus K8NE-Deluxe very strange things happen if cpufreq/powernow is activated *and* the cpu frequency is changed... Didn't change anything for me, I tried deactivating cpufreq, still crashes when I run that test program. This is getting pretty ridiculous.. I've tried memory timings down to the slowest possible, ran Memtest86 for 4 passes with no errors, and it's been stable in Windows for a few months now. Still something is blowing up in Linux with this test though.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Philip Lawatsch wrote: I've now tried the most conservative settings available. The 32 bit kernel now hangs after about 15 Iterations (compared to about 16000 before) but the 64 bit kernel still hangs after about 5000. I'm still seeing this on my system as well, using the most conservative timings possible (DDR200, all delay parameters except the refresh time set to the largest possible value) as well as DDR333 with the same timings and DDR400 with everything set to auto. I also tried the kernel on the Fedora Core 3 rescue disc (same crash) and in single user mode (same crash). So far, the crashes have consisted of either a hang, reboot or panic. One panic was a "spinlock already locked at kernel/module.c:2022" error. The other one is below, for what it's worth: Mar 31 18:55:43 Newcastle kernel: Unable to handle kernel paging request at 8100588f5000 RIP: Mar 31 18:55:43 Newcastle kernel: {clear_page+7} Mar 31 18:55:43 Newcastle kernel: PGD 8063 PUD a063 PMD 0 Mar 31 18:55:43 Newcastle kernel: Oops: 0002 [1] Mar 31 18:55:43 Newcastle kernel: CPU 0 Mar 31 18:55:43 Newcastle kernel: Modules linked in: md5(U) ipv6(U) parport_pc(U) lp(U) parport(U) autofs4(U) it87(U) i2c_sensor(U) i2c_isa(U) i2c_dev(U) i2c_core(U) sunrpc(U) pcmcia(U) yenta_socket(U) rsrc_nonstatic(U) pcmcia_core(U) joydev(U) nls_utf8(U) ntfs(U) vfat(U) fat(U) dm_mod(U) video(U) button(U) battery(U) ac(U) usb_storage(U) ohci1394(U) ieee1394(U) ohci_hcd(U) ehci_hcd(U) snd_ice1724(U) snd_ice17xx_ak4xxx(U) snd_ac97_codec(U) snd_pcm_oss(U) snd_mixer_oss(U) snd_pcm(U) snd_timer(U) snd_page_alloc(U) snd_ak4xxx_adda(U) snd_mpu401_uart(U) snd_rawmidi(U) snd_seq_device(U) snd(U) soundcore(U) forcedeth(U) floppy(U) ext3(U) jbd(U) sata_nv(U) libata(U) sd_mod(U) scsi_mod(U) Mar 31 18:55:43 Newcastle kernel: Pid: 4928, comm: crashtest Not tainted 2.6.11-1.7_FC3custom Mar 31 18:55:43 Newcastle kernel: RIP: 0010:[] {clear_page+7} Mar 31 18:55:43 Newcastle kernel: RSP: :810078299ca0 EFLAGS: 00010246 Mar 31 18:55:43 Newcastle kernel: RAX: RBX: 0001 RCX: 0200 Mar 31 18:55:43 Newcastle kernel: RDX: 80478940 RSI: RDI: 8100588f5000 Mar 31 18:55:43 Newcastle kernel: RBP: 81000235f5d0 R08: R09: Mar 31 18:55:43 Newcastle kernel: R10: 000552fa R11: R12: 8100 Mar 31 18:55:43 Newcastle kernel: R13: 81000235f598 R14: 6db6db6db6db6db7 R15: Mar 31 18:55:43 Newcastle kernel: FS: 2aabeb00() GS:80552300() knlGS: Mar 31 18:55:43 Newcastle kernel: CS: 0010 DS: ES: CR0: 8005003b Mar 31 18:55:43 Newcastle kernel: CR2: 8100588f5000 CR3: 791b CR4: 06e0 Mar 31 18:55:43 Newcastle kernel: Process crashtest (pid: 4928, threadinfo 810078298000, task 8100788d67e0) Mar 31 18:55:43 Newcastle kernel: Stack: 80170bc2 0019 0286 Mar 31 18:55:43 Newcastle kernel:000a 80d2000a 0286 0256 Mar 31 18:55:43 Newcastle kernel:80478bc0 Mar 31 18:55:43 Newcastle kernel: Call Trace:{buffered_rmqueue+1154} {__alloc_pages+220} Mar 31 18:55:43 Newcastle kernel: {do_no_page+370} {handle_mm_fault+560} Mar 31 18:55:43 Newcastle kernel: {write_chan+860} {do_page_fault+1044} Mar 31 18:55:43 Newcastle kernel: {thread_return+41} {error_exit+0} Mar 31 18:55:43 Newcastle kernel: Mar 31 18:55:43 Newcastle kernel: Mar 31 18:55:43 Newcastle kernel: Code: f3 48 ab c3 66 66 66 90 66 66 66 90 66 66 66 90 66 66 66 90 Mar 31 18:55:43 Newcastle kernel: RIP {clear_page+7} RSP Mar 31 18:55:43 Newcastle kernel: CR2: 8100588f5000 -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
> your memory timings are out of spec. I don't know what spec applies here, don't really care. But when I backed off my Memory Timing from 1T to 2T, my box became stable running this memset() test. So I am a happy camper, grateful that someone posted this nice test, and agree with you that it was a memory timing issue, at least for my system. Apparently Philip's box has additional "issues". Whatever. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
On Thu, Mar 31, 2005 at 12:04:59AM +0200, Philip Lawatsch wrote: > I do have a very strange problem: > > If I memset a ~1meg buffer some thousand times (in the userspace) it > will hardlock my machine. > > I've been using 2.6.12-rc1 and also a lot of other kernels (2.6.9, > 2.6.11). I've tried it both using a 32 bit kernel and a 64 bit kernel. > When running on the 32 bit kernel the machine hardlocks after about > 15000 iterations, on a 64 bit kernel the machine hardlocks after about > 5000 (the 64 bit system has nearly no background jobs running). > > I've been running memcheck for several hours now but nothing did show up. > > > I've got an Asus A8N-SLI board with 2 gigs of memory and an AMD 3500+ CPU. > > The 64 bit kernel was compiled using gcc 3.4.3 and the 32 bit kernel > using 3.3.5. [...] > powernow-k8: Found 1 AMD Athlon 64 / Opteron processors (version 1.00.09e) > powernow-k8:0 : fid 0xe (2200 MHz), vid 0x6 (1400 mV) > powernow-k8:1 : fid 0xc (2000 MHz), vid 0x8 (1350 mV) > powernow-k8:2 : fid 0xa (1800 MHz), vid 0xa (1300 mV) > powernow-k8:3 : fid 0x2 (1000 MHz), vid 0x12 (1100 mV) > cpu_init done, current fid 0xe, vid 0x6 Just a thought: does deactivating cpufreq change anything ? I haven't tested yet your program, but on my Asus K8NE-Deluxe very strange things happen if cpufreq/powernow is activated *and* the cpu frequency is changed... Stelian. -- Stelian Pop <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Paul Jackson writes: > Yup - kills my x86_64 too. I can't stay up for half a minute. ... > My mainboard is an MSI K8N Neo2 Platinum. I've tested both versions of the test program on two Athlon64 boxes, and neither has had any problems with them. My two machines are both VIA K8T800-based (a desktop and a laptop), but it seems those of you who had problems have nForce-based machines. So presumably it's either the nForce chipset or your memory timings are out of spec. /Mikael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Your problem is almost certainly in the hardware area (cpu, bios, memory, power, northbridge, motherboard, cooling or thereabouts). > Imo memtest86 should not hang onless something screws up [its] memory area There is nothing else running when memtest runs. You cannot assume that your hardware is operating like a sane digital computer when memtest hangs - the magic of zero's, one's and instruction set architectures is coming unglued and you are getting a glimpse of the ugliness that is usually hidden behind the curtain. Good luck fixing it. LKML is probably not the place to continue to analyze this, now that you've recreated it with memtest as well. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Paul Jackson wrote: > Denis wrote: > >>This reminds me on VIA northbridge problem when BIOS enabled >>a feature which was experimental and turned out to be buggy. > > > You were close! > > I changed my Memory Timing from 1T to 2T, and now it is as solid as a > rock. It has been up 7 minutes as I type this, without a hiccup. > > Notice this comment, at http://www.vr-zone.com.sg/?i=1641=1=0 > > Well as most Athlon 64 users know, 1T setting improves performance quite > significantly over 2T, but it is also very taxing on the memory and > quite a hit-and-miss when matching different memory with different > boards. From some users' feedback, the Asus A8N SLI can be a little > picky with 1T setting when overclocking, so results might be a little > better with other boards. > I've now tried the most conservative settings available. The 32 bit kernel now hangs after about 15 Iterations (compared to about 16000 before) but the 64 bit kernel still hangs after about 5000. After a ~12 hour memtest86 run memtest86 crashed (!), filling the console with some garbage characters and then hanging. This is driving me crazy. Imo memtest86 should not hang onless something screws up the memory area it is loaded into. I've also tried the newest beta bios for the board now, didnt change anything. kind regards Philip - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Denis wrote: > This reminds me on VIA northbridge problem when BIOS enabled > a feature which was experimental and turned out to be buggy. You were close! I changed my Memory Timing from 1T to 2T, and now it is as solid as a rock. It has been up 7 minutes as I type this, without a hiccup. Notice this comment, at http://www.vr-zone.com.sg/?i=1641=1=0 Well as most Athlon 64 users know, 1T setting improves performance quite significantly over 2T, but it is also very taxing on the memory and quite a hit-and-miss when matching different memory with different boards. From some users' feedback, the Asus A8N SLI can be a little picky with 1T setting when overclocking, so results might be a little better with other boards. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Denis wrote: This reminds me on VIA northbridge problem when BIOS enabled a feature which was experimental and turned out to be buggy. You were close! I changed my Memory Timing from 1T to 2T, and now it is as solid as a rock. It has been up 7 minutes as I type this, without a hiccup. Notice this comment, at http://www.vr-zone.com.sg/?i=1641p=1s=0 Well as most Athlon 64 users know, 1T setting improves performance quite significantly over 2T, but it is also very taxing on the memory and quite a hit-and-miss when matching different memory with different boards. From some users' feedback, the Asus A8N SLI can be a little picky with 1T setting when overclocking, so results might be a little better with other boards. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Paul Jackson wrote: Denis wrote: This reminds me on VIA northbridge problem when BIOS enabled a feature which was experimental and turned out to be buggy. You were close! I changed my Memory Timing from 1T to 2T, and now it is as solid as a rock. It has been up 7 minutes as I type this, without a hiccup. Notice this comment, at http://www.vr-zone.com.sg/?i=1641p=1s=0 Well as most Athlon 64 users know, 1T setting improves performance quite significantly over 2T, but it is also very taxing on the memory and quite a hit-and-miss when matching different memory with different boards. From some users' feedback, the Asus A8N SLI can be a little picky with 1T setting when overclocking, so results might be a little better with other boards. I've now tried the most conservative settings available. The 32 bit kernel now hangs after about 15 Iterations (compared to about 16000 before) but the 64 bit kernel still hangs after about 5000. After a ~12 hour memtest86 run memtest86 crashed (!), filling the console with some garbage characters and then hanging. This is driving me crazy. Imo memtest86 should not hang onless something screws up the memory area it is loaded into. I've also tried the newest beta bios for the board now, didnt change anything. kind regards Philip - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Your problem is almost certainly in the hardware area (cpu, bios, memory, power, northbridge, motherboard, cooling or thereabouts). Imo memtest86 should not hang onless something screws up [its] memory area There is nothing else running when memtest runs. You cannot assume that your hardware is operating like a sane digital computer when memtest hangs - the magic of zero's, one's and instruction set architectures is coming unglued and you are getting a glimpse of the ugliness that is usually hidden behind the curtain. Good luck fixing it. LKML is probably not the place to continue to analyze this, now that you've recreated it with memtest as well. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Paul Jackson writes: Yup - kills my x86_64 too. I can't stay up for half a minute. ... My mainboard is an MSI K8N Neo2 Platinum. I've tested both versions of the test program on two Athlon64 boxes, and neither has had any problems with them. My two machines are both VIA K8T800-based (a desktop and a laptop), but it seems those of you who had problems have nForce-based machines. So presumably it's either the nForce chipset or your memory timings are out of spec. /Mikael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
On Thu, Mar 31, 2005 at 12:04:59AM +0200, Philip Lawatsch wrote: I do have a very strange problem: If I memset a ~1meg buffer some thousand times (in the userspace) it will hardlock my machine. I've been using 2.6.12-rc1 and also a lot of other kernels (2.6.9, 2.6.11). I've tried it both using a 32 bit kernel and a 64 bit kernel. When running on the 32 bit kernel the machine hardlocks after about 15000 iterations, on a 64 bit kernel the machine hardlocks after about 5000 (the 64 bit system has nearly no background jobs running). I've been running memcheck for several hours now but nothing did show up. I've got an Asus A8N-SLI board with 2 gigs of memory and an AMD 3500+ CPU. The 64 bit kernel was compiled using gcc 3.4.3 and the 32 bit kernel using 3.3.5. [...] powernow-k8: Found 1 AMD Athlon 64 / Opteron processors (version 1.00.09e) powernow-k8:0 : fid 0xe (2200 MHz), vid 0x6 (1400 mV) powernow-k8:1 : fid 0xc (2000 MHz), vid 0x8 (1350 mV) powernow-k8:2 : fid 0xa (1800 MHz), vid 0xa (1300 mV) powernow-k8:3 : fid 0x2 (1000 MHz), vid 0x12 (1100 mV) cpu_init done, current fid 0xe, vid 0x6 Just a thought: does deactivating cpufreq change anything ? I haven't tested yet your program, but on my Asus K8NE-Deluxe very strange things happen if cpufreq/powernow is activated *and* the cpu frequency is changed... Stelian. -- Stelian Pop [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
your memory timings are out of spec. I don't know what spec applies here, don't really care. But when I backed off my Memory Timing from 1T to 2T, my box became stable running this memset() test. So I am a happy camper, grateful that someone posted this nice test, and agree with you that it was a memory timing issue, at least for my system. Apparently Philip's box has additional issues. Whatever. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Philip Lawatsch wrote: I've now tried the most conservative settings available. The 32 bit kernel now hangs after about 15 Iterations (compared to about 16000 before) but the 64 bit kernel still hangs after about 5000. I'm still seeing this on my system as well, using the most conservative timings possible (DDR200, all delay parameters except the refresh time set to the largest possible value) as well as DDR333 with the same timings and DDR400 with everything set to auto. I also tried the kernel on the Fedora Core 3 rescue disc (same crash) and in single user mode (same crash). So far, the crashes have consisted of either a hang, reboot or panic. One panic was a spinlock already locked at kernel/module.c:2022 error. The other one is below, for what it's worth: Mar 31 18:55:43 Newcastle kernel: Unable to handle kernel paging request at 8100588f5000 RIP: Mar 31 18:55:43 Newcastle kernel: 80236ac7{clear_page+7} Mar 31 18:55:43 Newcastle kernel: PGD 8063 PUD a063 PMD 0 Mar 31 18:55:43 Newcastle kernel: Oops: 0002 [1] Mar 31 18:55:43 Newcastle kernel: CPU 0 Mar 31 18:55:43 Newcastle kernel: Modules linked in: md5(U) ipv6(U) parport_pc(U) lp(U) parport(U) autofs4(U) it87(U) i2c_sensor(U) i2c_isa(U) i2c_dev(U) i2c_core(U) sunrpc(U) pcmcia(U) yenta_socket(U) rsrc_nonstatic(U) pcmcia_core(U) joydev(U) nls_utf8(U) ntfs(U) vfat(U) fat(U) dm_mod(U) video(U) button(U) battery(U) ac(U) usb_storage(U) ohci1394(U) ieee1394(U) ohci_hcd(U) ehci_hcd(U) snd_ice1724(U) snd_ice17xx_ak4xxx(U) snd_ac97_codec(U) snd_pcm_oss(U) snd_mixer_oss(U) snd_pcm(U) snd_timer(U) snd_page_alloc(U) snd_ak4xxx_adda(U) snd_mpu401_uart(U) snd_rawmidi(U) snd_seq_device(U) snd(U) soundcore(U) forcedeth(U) floppy(U) ext3(U) jbd(U) sata_nv(U) libata(U) sd_mod(U) scsi_mod(U) Mar 31 18:55:43 Newcastle kernel: Pid: 4928, comm: crashtest Not tainted 2.6.11-1.7_FC3custom Mar 31 18:55:43 Newcastle kernel: RIP: 0010:[80236ac7] 80236ac7{clear_page+7} Mar 31 18:55:43 Newcastle kernel: RSP: :810078299ca0 EFLAGS: 00010246 Mar 31 18:55:43 Newcastle kernel: RAX: RBX: 0001 RCX: 0200 Mar 31 18:55:43 Newcastle kernel: RDX: 80478940 RSI: RDI: 8100588f5000 Mar 31 18:55:43 Newcastle kernel: RBP: 81000235f5d0 R08: R09: Mar 31 18:55:43 Newcastle kernel: R10: 000552fa R11: R12: 8100 Mar 31 18:55:43 Newcastle kernel: R13: 81000235f598 R14: 6db6db6db6db6db7 R15: Mar 31 18:55:43 Newcastle kernel: FS: 2aabeb00() GS:80552300() knlGS: Mar 31 18:55:43 Newcastle kernel: CS: 0010 DS: ES: CR0: 8005003b Mar 31 18:55:43 Newcastle kernel: CR2: 8100588f5000 CR3: 791b CR4: 06e0 Mar 31 18:55:43 Newcastle kernel: Process crashtest (pid: 4928, threadinfo 810078298000, task 8100788d67e0) Mar 31 18:55:43 Newcastle kernel: Stack: 80170bc2 0019 0286 Mar 31 18:55:43 Newcastle kernel:000a 80d2000a 0286 0256 Mar 31 18:55:43 Newcastle kernel:80478bc0 Mar 31 18:55:43 Newcastle kernel: Call Trace:80170bc2{buffered_rmqueue+1154} 80170dac{__alloc_pages+220} Mar 31 18:55:43 Newcastle kernel: 80181c52{do_no_page+370} 801825c0{handle_mm_fault+560} Mar 31 18:55:43 Newcastle kernel: 80284f9c{write_chan+860} 80123834{do_page_fault+1044} Mar 31 18:55:43 Newcastle kernel: 803a3699{thread_return+41} 8010f58d{error_exit+0} Mar 31 18:55:43 Newcastle kernel: Mar 31 18:55:43 Newcastle kernel: Mar 31 18:55:43 Newcastle kernel: Code: f3 48 ab c3 66 66 66 90 66 66 66 90 66 66 66 90 66 66 66 90 Mar 31 18:55:43 Newcastle kernel: RIP 80236ac7{clear_page+7} RSP 810078299ca0 Mar 31 18:55:43 Newcastle kernel: CR2: 8100588f5000 -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Stelian Pop wrote: Just a thought: does deactivating cpufreq change anything ? I haven't tested yet your program, but on my Asus K8NE-Deluxe very strange things happen if cpufreq/powernow is activated *and* the cpu frequency is changed... Didn't change anything for me, I tried deactivating cpufreq, still crashes when I run that test program. This is getting pretty ridiculous.. I've tried memory timings down to the slowest possible, ran Memtest86 for 4 passes with no errors, and it's been stable in Windows for a few months now. Still something is blowing up in Linux with this test though.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Yup - kills my x86_64 too. I can't stay up for half a minute. I got a couple of Oops Unable to handle kernel paging request at 2730 RIP: Unable to handle kernel paging request at 81773ffc6918 RIP: The first try ended with a sudden reboot. The second time, I ctrl-C'd out while I still had a responsive system. I thought it might be a CPU temperature issue, so downloaded XMBmon "Mother Board Monitor Program for X Window System", and hacked the command line mbmon in it to add this memset loop and report the CPU temp each time around the loop. My CPU Temp went from its usual 39 C idle, to 45 C during the memset loop, which are typical temperatures for this PC. No problem there. In a couple more tries, I got: knotify killed with a SIGSEGV artsd killed with a SIGSEGV a hard lockup, requiring the big red button a second oops at the same 81773ffc6918 as above. My CPU, from /proc/cpuinfo, is: model name : AMD Athlon(tm) 64 Processor 3500+ My mainboard is an MSI K8N Neo2 Platinum. I have 1 GByte of Corsair XMS DDR400 memory. I am not overclocking and I am running with standard voltages. This is on a 2.6.11-rc5 kernel, though I doubt that matters. I'm guessing it's hardware. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
On Thursday 31 March 2005 07:38, Robert Hancock wrote: > Philip Lawatsch wrote: > > Hi, > > > > > > I do have a very strange problem: > > > > If I memset a ~1meg buffer some thousand times (in the userspace) it > > will hardlock my machine. > > I thought that this must be impossible, but I tried it on my machine > which is very similar (Asus A8N-SLI, Athlon 64 3500+, 2GB RAM) and to my > surprise it breaks on mine too with kernel 2.6.11. I tested using the > program below. After about a minute or so of this, the machine either > locked hard or rebooted spontaneously. When it locked, there was no oops > message, the NMI watchdog was not triggered and there was no response to > SysRq commands. (I tested it with and without the NVIDIA module loaded.) > > This seems pretty terrible, a perfectly legal program running as a > normal user is hard-locking the machine. Anyone have any suggestions to > debug this? Also, can somebody else on an x86_64 try and duplicate this? > > #include > #include > #include > > int main( int argc, char* argv[] ) > { > char* test = malloc(512*1024*1024); > int i; > for( i=0; i<100; i++ ) > { > memset( test, 0, 512*1024*1024); > } > free(test); > return 0; > } This reminds me on VIA northbridge problem when BIOS enabled a feature which was experimental and turned out to be buggy. Was causing oopses ONLY on K7 optimized kernels because of movntq stores used. They seem to put an awful lot of writes on the bus. -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Philip Lawatsch wrote: Hi, I do have a very strange problem: If I memset a ~1meg buffer some thousand times (in the userspace) it will hardlock my machine. I thought that this must be impossible, but I tried it on my machine which is very similar (Asus A8N-SLI, Athlon 64 3500+, 2GB RAM) and to my surprise it breaks on mine too with kernel 2.6.11. I tested using the program below. After about a minute or so of this, the machine either locked hard or rebooted spontaneously. When it locked, there was no oops message, the NMI watchdog was not triggered and there was no response to SysRq commands. (I tested it with and without the NVIDIA module loaded.) This seems pretty terrible, a perfectly legal program running as a normal user is hard-locking the machine. Anyone have any suggestions to debug this? Also, can somebody else on an x86_64 try and duplicate this? #include #include #include int main( int argc, char* argv[] ) { char* test = malloc(512*1024*1024); int i; for( i=0; i<100; i++ ) { memset( test, 0, 512*1024*1024); } free(test); return 0; } Bootdata ok (command line is ro root=LABEL=/) Linux version 2.6.11-1.7_FC3custom ([EMAIL PROTECTED]) (gcc version 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)) #1 Thu Mar 24 21:23:17 CST 2005 BIOS-provided physical RAM map: BIOS-e820: - 0009f800 (usable) BIOS-e820: 0009f800 - 000a (reserved) BIOS-e820: 000f - 0010 (reserved) BIOS-e820: 0010 - 7fff (usable) BIOS-e820: 7fff - 7fff3000 (ACPI NVS) BIOS-e820: 7fff3000 - 8000 (ACPI data) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - fec01000 (reserved) BIOS-e820: fee0 - fef0 (reserved) BIOS-e820: fefffc00 - ff00 (reserved) BIOS-e820: - 0001 (reserved) ACPI: RSDP (v000 Nvidia) @ 0x000f7d50 ACPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff3040 ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff30c0 ACPI: MCFG (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff9640 ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff9580 ACPI: DSDT (v001 NVIDIA AWRDACPI 0x1000 MSFT 0x010e) @ 0x On node 0 totalpages: 524272 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 520176 pages, LIFO batch:16 HighMem zone: 0 pages, LIFO batch:1 Nvidia board detected. Ignoring ACPI timer override. ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 15:15 APIC version 16 ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 2, version 17, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: BIOS IRQ0 pin2 override ignored. ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge) ACPI: IRQ9 used by override. ACPI: IRQ14 used by override. ACPI: IRQ15 used by override. Setting APIC routing to flat Using ACPI (MADT) for SMP configuration information Checking aperture... CPU 0: aperture @ 132000 size 32 MB Aperture from northbridge cpu 0 too small (32 MB) No AGP bridge found Built 1 zonelists Kernel command line: ro root=LABEL=/ console=tty0 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 131072 bytes) time.c: Using 1.193182 MHz PIT timer. time.c: Detected 2211.365 MHz processor. Console: colour VGA+ 80x25 Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) Memory: 2055568k/2097088k available (2722k kernel code, 40732k reserved, 1239k data, 188k init) Calibrating delay loop... 4374.52 BogoMIPS (lpj=2187264) Security Framework v1.0.0 initialized SELinux: Initializing. SELinux: Starting in permissive mode selinux_register_security: Registering secondary module capability Capability LSM initialized as secondary Mount-cache hash table entries: 256 (order: 0, 4096 bytes) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) CPU: AMD Athlon(tm) 64 Processor 3500+ stepping 00 Using local APIC NMI watchdog using perfctr0 Using local APIC timer interrupts. Detected 12.564 MHz APIC timer. checking if image is initramfs... it is NET: Registered protocol family 16 PCI: Using configuration type 1 mtrr: v2.0 (20020519) ACPI: Subsystem revision 20050211 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (00:00) PCI: Probing PCI hardware (bus 00) PCI: Transparent bridge - :00:09.0 ACPI: PCI
Re: AMD64 Machine hardlocks when using memset
Matthias-Christian Ott wrote: You want to allocate a lot of memory (16 GB), you don't have that much space, so the Kernel hangs. No, this is not what it is doing. The program is simply wiping the same 1MB block of memory over and over. If it was doing what you say it would not (or should not) lock the machine anyway. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Philip Lawatsch schrieb: Hi, I do have a very strange problem: If I memset a ~1meg buffer some thousand times (in the userspace) it will hardlock my machine. I've been using 2.6.12-rc1 and also a lot of other kernels (2.6.9, 2.6.11). I've tried it both using a 32 bit kernel and a 64 bit kernel. When running on the 32 bit kernel the machine hardlocks after about 15000 iterations, on a 64 bit kernel the machine hardlocks after about 5000 (the 64 bit system has nearly no background jobs running). I've been running memcheck for several hours now but nothing did show up. I've got an Asus A8N-SLI board with 2 gigs of memory and an AMD 3500+ CPU. The 64 bit kernel was compiled using gcc 3.4.3 and the 32 bit kernel using 3.3.5. This simple programm will kill my machine: #include #include int main(int argc, char *argv[]) { char buf[1024*1024]; int i; for (i=0;i<1024*16;++i) { printf("%d\n",i); memset(buf,0,1024*1024); } printf("Done\n"); return 0; } If I usleep for 1ms after each memset the whole thing will happily run forever without any problems. Also if I start it twice (without sleeping in the loop) the machine wont hardlock either (tested with a 32 bit kernel). I'd really appreciate any pointers as to what might be wrong here. I've tried both kernels with and without preemption. kind regards Philip Bootdata ok (command line is BOOT_IMAGE=test ro root=809) Linux version 2.6.12-rc1 ([EMAIL PROTECTED]) (gcc version 3.4.3 20041125 (Gentoo Linux 3.4.3-r1, ssp-3.4.3-0, pie-8.7.7)) #1 Wed Mar 30 23:30:20 CEST 2005 BIOS-provided physical RAM map: BIOS-e820: - 0009f800 (usable) BIOS-e820: 0009f800 - 000a (reserved) BIOS-e820: 000f - 0010 (reserved) BIOS-e820: 0010 - 7fff (usable) BIOS-e820: 7fff - 7fff3000 (ACPI NVS) BIOS-e820: 7fff3000 - 8000 (ACPI data) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - fec01000 (reserved) BIOS-e820: fee0 - fef0 (reserved) BIOS-e820: fefffc00 - ff00 (reserved) BIOS-e820: - 0001 (reserved) ACPI: RSDP (v000 Nvidia) @ 0x000f78c0 ACPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff3040 ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff30c0 ACPI: MCFG (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff9540 ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff9480 ACPI: DSDT (v001 NVIDIA AWRDACPI 0x1000 MSFT 0x010e) @ 0x On node 0 totalpages: 524272 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 520176 pages, LIFO batch:16 HighMem zone: 0 pages, LIFO batch:1 Nvidia board detected. Ignoring ACPI timer override. ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 15:15 APIC version 16 ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 2, version 17, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: BIOS IRQ0 pin2 override ignored. ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge) ACPI: IRQ9 used by override. ACPI: IRQ14 used by override. ACPI: IRQ15 used by override. Setting APIC routing to flat Using ACPI (MADT) for SMP configuration information Built 1 zonelists Kernel command line: BOOT_IMAGE=test ro root=809 console=tty0 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 131072 bytes) time.c: Using 1.193182 MHz PIT timer. time.c: Detected 2211.376 MHz processor. Console: colour VGA+ 80x25 Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) Memory: 2056168k/2097088k available (3281k kernel code, 40236k reserved, 1386k data, 188k init) Calibrating delay loop... 4374.52 BogoMIPS (lpj=2187264) Mount-cache hash table entries: 256 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) CPU: AMD Athlon(tm) 64 Processor 3500+ stepping 00 Using local APIC NMI watchdog using perfctr0 Using local APIC timer interrupts. Detected 12.564 MHz APIC timer. NET: Registered protocol family 16 PCI: Using configuration type 1 mtrr: v2.0 (20020519) ACPI: Subsystem revision 20050211 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (00:00) PCI: Probing PCI hardware (bus 00) PCI: Transparent bridge - :00:09.0 ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI
AMD64 Machine hardlocks when using memset
Hi, I do have a very strange problem: If I memset a ~1meg buffer some thousand times (in the userspace) it will hardlock my machine. I've been using 2.6.12-rc1 and also a lot of other kernels (2.6.9, 2.6.11). I've tried it both using a 32 bit kernel and a 64 bit kernel. When running on the 32 bit kernel the machine hardlocks after about 15000 iterations, on a 64 bit kernel the machine hardlocks after about 5000 (the 64 bit system has nearly no background jobs running). I've been running memcheck for several hours now but nothing did show up. I've got an Asus A8N-SLI board with 2 gigs of memory and an AMD 3500+ CPU. The 64 bit kernel was compiled using gcc 3.4.3 and the 32 bit kernel using 3.3.5. This simple programm will kill my machine: #include #include int main(int argc, char *argv[]) { char buf[1024*1024]; int i; for (i=0;i<1024*16;++i) { printf("%d\n",i); memset(buf,0,1024*1024); } printf("Done\n"); return 0; } If I usleep for 1ms after each memset the whole thing will happily run forever without any problems. Also if I start it twice (without sleeping in the loop) the machine wont hardlock either (tested with a 32 bit kernel). I'd really appreciate any pointers as to what might be wrong here. I've tried both kernels with and without preemption. kind regards Philip >Bootdata ok (command line is BOOT_IMAGE=test ro root=809) Linux version 2.6.12-rc1 ([EMAIL PROTECTED]) (gcc version 3.4.3 20041125 (Gentoo Linux 3.4.3-r1, ssp-3.4.3-0, pie-8.7.7)) #1 Wed Mar 30 23:30:20 CEST 2005 BIOS-provided physical RAM map: BIOS-e820: - 0009f800 (usable) BIOS-e820: 0009f800 - 000a (reserved) BIOS-e820: 000f - 0010 (reserved) BIOS-e820: 0010 - 7fff (usable) BIOS-e820: 7fff - 7fff3000 (ACPI NVS) BIOS-e820: 7fff3000 - 8000 (ACPI data) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - fec01000 (reserved) BIOS-e820: fee0 - fef0 (reserved) BIOS-e820: fefffc00 - ff00 (reserved) BIOS-e820: - 0001 (reserved) ACPI: RSDP (v000 Nvidia) @ 0x000f78c0 ACPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff3040 ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff30c0 ACPI: MCFG (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff9540 ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff9480 ACPI: DSDT (v001 NVIDIA AWRDACPI 0x1000 MSFT 0x010e) @ 0x On node 0 totalpages: 524272 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 520176 pages, LIFO batch:16 HighMem zone: 0 pages, LIFO batch:1 Nvidia board detected. Ignoring ACPI timer override. ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 15:15 APIC version 16 ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 2, version 17, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: BIOS IRQ0 pin2 override ignored. ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge) ACPI: IRQ9 used by override. ACPI: IRQ14 used by override. ACPI: IRQ15 used by override. Setting APIC routing to flat Using ACPI (MADT) for SMP configuration information Built 1 zonelists Kernel command line: BOOT_IMAGE=test ro root=809 console=tty0 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 131072 bytes) time.c: Using 1.193182 MHz PIT timer. time.c: Detected 2211.376 MHz processor. Console: colour VGA+ 80x25 Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) Memory: 2056168k/2097088k available (3281k kernel code, 40236k reserved, 1386k data, 188k init) Calibrating delay loop... 4374.52 BogoMIPS (lpj=2187264) Mount-cache hash table entries: 256 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) CPU: AMD Athlon(tm) 64 Processor 3500+ stepping 00 Using local APIC NMI watchdog using perfctr0 Using local APIC timer interrupts. Detected 12.564 MHz APIC timer. NET: Registered protocol family 16 PCI: Using configuration type 1 mtrr: v2.0 (20020519) ACPI: Subsystem revision 20050211 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (00:00) PCI: Probing PCI hardware (bus 00) PCI: Transparent bridge - :00:09.0 ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.HUB0._PRT] ACPI: PCI
AMD64 Machine hardlocks when using memset
Hi, I do have a very strange problem: If I memset a ~1meg buffer some thousand times (in the userspace) it will hardlock my machine. I've been using 2.6.12-rc1 and also a lot of other kernels (2.6.9, 2.6.11). I've tried it both using a 32 bit kernel and a 64 bit kernel. When running on the 32 bit kernel the machine hardlocks after about 15000 iterations, on a 64 bit kernel the machine hardlocks after about 5000 (the 64 bit system has nearly no background jobs running). I've been running memcheck for several hours now but nothing did show up. I've got an Asus A8N-SLI board with 2 gigs of memory and an AMD 3500+ CPU. The 64 bit kernel was compiled using gcc 3.4.3 and the 32 bit kernel using 3.3.5. This simple programm will kill my machine: #include stdlib.h #include stdio.h int main(int argc, char *argv[]) { char buf[1024*1024]; int i; for (i=0;i1024*16;++i) { printf(%d\n,i); memset(buf,0,1024*1024); } printf(Done\n); return 0; } If I usleep for 1ms after each memset the whole thing will happily run forever without any problems. Also if I start it twice (without sleeping in the loop) the machine wont hardlock either (tested with a 32 bit kernel). I'd really appreciate any pointers as to what might be wrong here. I've tried both kernels with and without preemption. kind regards Philip Bootdata ok (command line is BOOT_IMAGE=test ro root=809) Linux version 2.6.12-rc1 ([EMAIL PROTECTED]) (gcc version 3.4.3 20041125 (Gentoo Linux 3.4.3-r1, ssp-3.4.3-0, pie-8.7.7)) #1 Wed Mar 30 23:30:20 CEST 2005 BIOS-provided physical RAM map: BIOS-e820: - 0009f800 (usable) BIOS-e820: 0009f800 - 000a (reserved) BIOS-e820: 000f - 0010 (reserved) BIOS-e820: 0010 - 7fff (usable) BIOS-e820: 7fff - 7fff3000 (ACPI NVS) BIOS-e820: 7fff3000 - 8000 (ACPI data) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - fec01000 (reserved) BIOS-e820: fee0 - fef0 (reserved) BIOS-e820: fefffc00 - ff00 (reserved) BIOS-e820: - 0001 (reserved) ACPI: RSDP (v000 Nvidia) @ 0x000f78c0 ACPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff3040 ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff30c0 ACPI: MCFG (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff9540 ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff9480 ACPI: DSDT (v001 NVIDIA AWRDACPI 0x1000 MSFT 0x010e) @ 0x On node 0 totalpages: 524272 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 520176 pages, LIFO batch:16 HighMem zone: 0 pages, LIFO batch:1 Nvidia board detected. Ignoring ACPI timer override. ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 15:15 APIC version 16 ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 2, version 17, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: BIOS IRQ0 pin2 override ignored. ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge) ACPI: IRQ9 used by override. ACPI: IRQ14 used by override. ACPI: IRQ15 used by override. Setting APIC routing to flat Using ACPI (MADT) for SMP configuration information Built 1 zonelists Kernel command line: BOOT_IMAGE=test ro root=809 console=tty0 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 131072 bytes) time.c: Using 1.193182 MHz PIT timer. time.c: Detected 2211.376 MHz processor. Console: colour VGA+ 80x25 Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) Memory: 2056168k/2097088k available (3281k kernel code, 40236k reserved, 1386k data, 188k init) Calibrating delay loop... 4374.52 BogoMIPS (lpj=2187264) Mount-cache hash table entries: 256 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) CPU: AMD Athlon(tm) 64 Processor 3500+ stepping 00 Using local APIC NMI watchdog using perfctr0 Using local APIC timer interrupts. Detected 12.564 MHz APIC timer. NET: Registered protocol family 16 PCI: Using configuration type 1 mtrr: v2.0 (20020519) ACPI: Subsystem revision 20050211 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (00:00) PCI: Probing PCI hardware (bus 00) PCI: Transparent bridge - :00:09.0 ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.HUB0._PRT] ACPI:
Re: AMD64 Machine hardlocks when using memset
Philip Lawatsch schrieb: Hi, I do have a very strange problem: If I memset a ~1meg buffer some thousand times (in the userspace) it will hardlock my machine. I've been using 2.6.12-rc1 and also a lot of other kernels (2.6.9, 2.6.11). I've tried it both using a 32 bit kernel and a 64 bit kernel. When running on the 32 bit kernel the machine hardlocks after about 15000 iterations, on a 64 bit kernel the machine hardlocks after about 5000 (the 64 bit system has nearly no background jobs running). I've been running memcheck for several hours now but nothing did show up. I've got an Asus A8N-SLI board with 2 gigs of memory and an AMD 3500+ CPU. The 64 bit kernel was compiled using gcc 3.4.3 and the 32 bit kernel using 3.3.5. This simple programm will kill my machine: #include stdlib.h #include stdio.h int main(int argc, char *argv[]) { char buf[1024*1024]; int i; for (i=0;i1024*16;++i) { printf(%d\n,i); memset(buf,0,1024*1024); } printf(Done\n); return 0; } If I usleep for 1ms after each memset the whole thing will happily run forever without any problems. Also if I start it twice (without sleeping in the loop) the machine wont hardlock either (tested with a 32 bit kernel). I'd really appreciate any pointers as to what might be wrong here. I've tried both kernels with and without preemption. kind regards Philip Bootdata ok (command line is BOOT_IMAGE=test ro root=809) Linux version 2.6.12-rc1 ([EMAIL PROTECTED]) (gcc version 3.4.3 20041125 (Gentoo Linux 3.4.3-r1, ssp-3.4.3-0, pie-8.7.7)) #1 Wed Mar 30 23:30:20 CEST 2005 BIOS-provided physical RAM map: BIOS-e820: - 0009f800 (usable) BIOS-e820: 0009f800 - 000a (reserved) BIOS-e820: 000f - 0010 (reserved) BIOS-e820: 0010 - 7fff (usable) BIOS-e820: 7fff - 7fff3000 (ACPI NVS) BIOS-e820: 7fff3000 - 8000 (ACPI data) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - fec01000 (reserved) BIOS-e820: fee0 - fef0 (reserved) BIOS-e820: fefffc00 - ff00 (reserved) BIOS-e820: - 0001 (reserved) ACPI: RSDP (v000 Nvidia) @ 0x000f78c0 ACPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff3040 ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff30c0 ACPI: MCFG (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff9540 ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff9480 ACPI: DSDT (v001 NVIDIA AWRDACPI 0x1000 MSFT 0x010e) @ 0x On node 0 totalpages: 524272 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 520176 pages, LIFO batch:16 HighMem zone: 0 pages, LIFO batch:1 Nvidia board detected. Ignoring ACPI timer override. ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 15:15 APIC version 16 ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 2, version 17, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: BIOS IRQ0 pin2 override ignored. ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge) ACPI: IRQ9 used by override. ACPI: IRQ14 used by override. ACPI: IRQ15 used by override. Setting APIC routing to flat Using ACPI (MADT) for SMP configuration information Built 1 zonelists Kernel command line: BOOT_IMAGE=test ro root=809 console=tty0 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 131072 bytes) time.c: Using 1.193182 MHz PIT timer. time.c: Detected 2211.376 MHz processor. Console: colour VGA+ 80x25 Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) Memory: 2056168k/2097088k available (3281k kernel code, 40236k reserved, 1386k data, 188k init) Calibrating delay loop... 4374.52 BogoMIPS (lpj=2187264) Mount-cache hash table entries: 256 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) CPU: AMD Athlon(tm) 64 Processor 3500+ stepping 00 Using local APIC NMI watchdog using perfctr0 Using local APIC timer interrupts. Detected 12.564 MHz APIC timer. NET: Registered protocol family 16 PCI: Using configuration type 1 mtrr: v2.0 (20020519) ACPI: Subsystem revision 20050211 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (00:00) PCI: Probing PCI hardware (bus 00) PCI: Transparent bridge - :00:09.0 ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
Re: AMD64 Machine hardlocks when using memset
Philip Lawatsch wrote: Hi, I do have a very strange problem: If I memset a ~1meg buffer some thousand times (in the userspace) it will hardlock my machine. I thought that this must be impossible, but I tried it on my machine which is very similar (Asus A8N-SLI, Athlon 64 3500+, 2GB RAM) and to my surprise it breaks on mine too with kernel 2.6.11. I tested using the program below. After about a minute or so of this, the machine either locked hard or rebooted spontaneously. When it locked, there was no oops message, the NMI watchdog was not triggered and there was no response to SysRq commands. (I tested it with and without the NVIDIA module loaded.) This seems pretty terrible, a perfectly legal program running as a normal user is hard-locking the machine. Anyone have any suggestions to debug this? Also, can somebody else on an x86_64 try and duplicate this? #include stdio.h #include stdlib.h #include string.h int main( int argc, char* argv[] ) { char* test = malloc(512*1024*1024); int i; for( i=0; i100; i++ ) { memset( test, 0, 512*1024*1024); } free(test); return 0; } Bootdata ok (command line is ro root=LABEL=/) Linux version 2.6.11-1.7_FC3custom ([EMAIL PROTECTED]) (gcc version 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)) #1 Thu Mar 24 21:23:17 CST 2005 BIOS-provided physical RAM map: BIOS-e820: - 0009f800 (usable) BIOS-e820: 0009f800 - 000a (reserved) BIOS-e820: 000f - 0010 (reserved) BIOS-e820: 0010 - 7fff (usable) BIOS-e820: 7fff - 7fff3000 (ACPI NVS) BIOS-e820: 7fff3000 - 8000 (ACPI data) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - fec01000 (reserved) BIOS-e820: fee0 - fef0 (reserved) BIOS-e820: fefffc00 - ff00 (reserved) BIOS-e820: - 0001 (reserved) ACPI: RSDP (v000 Nvidia) @ 0x000f7d50 ACPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff3040 ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff30c0 ACPI: MCFG (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff9640 ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x) @ 0x7fff9580 ACPI: DSDT (v001 NVIDIA AWRDACPI 0x1000 MSFT 0x010e) @ 0x On node 0 totalpages: 524272 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 520176 pages, LIFO batch:16 HighMem zone: 0 pages, LIFO batch:1 Nvidia board detected. Ignoring ACPI timer override. ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 15:15 APIC version 16 ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 2, version 17, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: BIOS IRQ0 pin2 override ignored. ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge) ACPI: IRQ9 used by override. ACPI: IRQ14 used by override. ACPI: IRQ15 used by override. Setting APIC routing to flat Using ACPI (MADT) for SMP configuration information Checking aperture... CPU 0: aperture @ 132000 size 32 MB Aperture from northbridge cpu 0 too small (32 MB) No AGP bridge found Built 1 zonelists Kernel command line: ro root=LABEL=/ console=tty0 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 131072 bytes) time.c: Using 1.193182 MHz PIT timer. time.c: Detected 2211.365 MHz processor. Console: colour VGA+ 80x25 Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) Memory: 2055568k/2097088k available (2722k kernel code, 40732k reserved, 1239k data, 188k init) Calibrating delay loop... 4374.52 BogoMIPS (lpj=2187264) Security Framework v1.0.0 initialized SELinux: Initializing. SELinux: Starting in permissive mode selinux_register_security: Registering secondary module capability Capability LSM initialized as secondary Mount-cache hash table entries: 256 (order: 0, 4096 bytes) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) CPU: AMD Athlon(tm) 64 Processor 3500+ stepping 00 Using local APIC NMI watchdog using perfctr0 Using local APIC timer interrupts. Detected 12.564 MHz APIC timer. checking if image is initramfs... it is NET: Registered protocol family 16 PCI: Using configuration type 1 mtrr: v2.0 (20020519) ACPI: Subsystem revision 20050211 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (00:00) PCI: Probing PCI hardware (bus 00) PCI: Transparent bridge -
Re: AMD64 Machine hardlocks when using memset
On Thursday 31 March 2005 07:38, Robert Hancock wrote: Philip Lawatsch wrote: Hi, I do have a very strange problem: If I memset a ~1meg buffer some thousand times (in the userspace) it will hardlock my machine. I thought that this must be impossible, but I tried it on my machine which is very similar (Asus A8N-SLI, Athlon 64 3500+, 2GB RAM) and to my surprise it breaks on mine too with kernel 2.6.11. I tested using the program below. After about a minute or so of this, the machine either locked hard or rebooted spontaneously. When it locked, there was no oops message, the NMI watchdog was not triggered and there was no response to SysRq commands. (I tested it with and without the NVIDIA module loaded.) This seems pretty terrible, a perfectly legal program running as a normal user is hard-locking the machine. Anyone have any suggestions to debug this? Also, can somebody else on an x86_64 try and duplicate this? #include stdio.h #include stdlib.h #include string.h int main( int argc, char* argv[] ) { char* test = malloc(512*1024*1024); int i; for( i=0; i100; i++ ) { memset( test, 0, 512*1024*1024); } free(test); return 0; } This reminds me on VIA northbridge problem when BIOS enabled a feature which was experimental and turned out to be buggy. Was causing oopses ONLY on K7 optimized kernels because of movntq stores used. They seem to put an awful lot of writes on the bus. -- vda - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AMD64 Machine hardlocks when using memset
Yup - kills my x86_64 too. I can't stay up for half a minute. I got a couple of Oops Unable to handle kernel paging request at 2730 RIP: Unable to handle kernel paging request at 81773ffc6918 RIP: The first try ended with a sudden reboot. The second time, I ctrl-C'd out while I still had a responsive system. I thought it might be a CPU temperature issue, so downloaded XMBmon Mother Board Monitor Program for X Window System, and hacked the command line mbmon in it to add this memset loop and report the CPU temp each time around the loop. My CPU Temp went from its usual 39 C idle, to 45 C during the memset loop, which are typical temperatures for this PC. No problem there. In a couple more tries, I got: knotify killed with a SIGSEGV artsd killed with a SIGSEGV a hard lockup, requiring the big red button a second oops at the same 81773ffc6918 as above. My CPU, from /proc/cpuinfo, is: model name : AMD Athlon(tm) 64 Processor 3500+ My mainboard is an MSI K8N Neo2 Platinum. I have 1 GByte of Corsair XMS DDR400 memory. I am not overclocking and I am running with standard voltages. This is on a 2.6.11-rc5 kernel, though I doubt that matters. I'm guessing it's hardware. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/