Re: [gentoo-user] Re: [Update] - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler
On Fri, 22 Jul 2005 21:32:19 -0400, Robert Crawford wrote: OK- if it doesn't happen during light computing stuff, and only with very cpu intensive stuff like compiling, I feel virtually certain it is a cpu heat issue. IMHO, there's not really any other reasonable explanation. A duff power supply? That would also tend to suffer under heavy load. -- Neil Bothwick I must have slipped a disk; my pack hurts. pgpBoIDZfG8Sh.pgp Description: PGP signature
Re: [gentoo-user] Re: update - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler
Joseph wrote: [snip] Thank you for suggestion, I'm re-installing Gentoo and definitely run these tools. For sure I have some hardware memory problem as my latest error message is: Uhhuh. NMI received. Dazed and confused, but trying to continue You probably have a hardware problem with your RAM chips NMI: IOCK error (debug interrupt?) CPU 0 Modules linked in: evdev via_rhine mii parport_pc parport ahci sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil sata_promis libata sbp2 ohci1934 ieee1394 usb_storage ohci_hcd uhci_hcd ehci_hcd usbcore Pid: 5626, comm: rsync Not tainted 2.6.11-gentoo-r3-k8 RIP: 0010:[. Though I've run memtest86 two day ago and 17-passes went without any errors. How may ram modules do you have, any spares? Maybe you can stress test them one at a time. Zac -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] Re: update - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler
Though I've run memtest86 two day ago and 17-passes went without any errors. How may ram modules do you have, any spares? Maybe you can stress test them one at a time. Zac I have two memory sticks; and yes I run them individually as well. Though Francesco made good pointer. I'll borrow two sticks from another machine and try to run it as well. quote-- Sometimes memtest doesn't stress enough the hardware, see: http://people.redhat.com/dledford/memtest.html --- end quote- along with Bob, who pointed me to some good gentoo hardware utility the I possibly will be able to run tomorrow. It is definitely a hardware problem. I'll let you know once I find out. -- #Joseph -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] Re: update - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler
Joseph wrote: Though I've run memtest86 two day ago and 17-passes went without any errors. How may ram modules do you have, any spares? Maybe you can stress test them one at a time. Zac I have two memory sticks; and yes I run them individually as well. Though Francesco made good pointer. I'll borrow two sticks from another machine and try to run it as well. quote-- Sometimes memtest doesn't stress enough the hardware, see: http://people.redhat.com/dledford/memtest.html --- end quote- along with Bob, who pointed me to some good gentoo hardware utility the I possibly will be able to run tomorrow. It is definitely a hardware problem. I'll let you know once I find out. Lots of good advice. I've been taking notes ;-). Zac -- gentoo-user@gentoo.org mailing list
[gentoo-user] Re: [Update] - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler
On Fri, 2005-07-22 at 10:00 -0400, Robert Crawford wrote: I have an old IDE drive, maybe I can squeeze Gentoo on it for testing. Bob has a good idea too regarding the CPU compound under the heat-sink but at CPU temp. 39C I don't see how that could cause any problem. -- #Joseph No matter what the temp sensors are reading, your problem definitely sounds like it's heat related. Temp sensor readings can, and often are not accurate, sometimes to an amazing degree. I don't know what type of sensor your cpu uses, but if it's the type under the cpu, it might not be in good contact with the cpu itself, thus giving false readings, I wouldn't be surprised if your cpu temp was really over 50C. In my experience, temps over 50C. with AMD 32bit cpus start giving problems like this, no matter what AMD says about it. Seeing as how you have an AMD 64, I'm not sure about the sensor type- all I'm saying is that the readings can vary wildly, and are not to be trusted, especially considering your current problems. Robert Crawford Now I tend to lean towards your solution. It could be heat related. I've disable on-board network and add standard PCI card on a IRQ3 (separate IRQ); so the SATA controller has its own IRQ as well. The computer freesed once during emerge sync without any error message, and with error during emerge Apache. -- #Joseph -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] Re: [Update] - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler
If you are using the thermal pad, tape, or grease that came with the stock heatsink, you might try using some arctic silver compound instead. It's good for a 3-5C. drop from the regular stuff. Sometimes even the AMD approved stock heatsinks don't do the job, and you might need to get a better one (assuming heat is the problem). I build a lot of computers, and with AMD cpus, overkill in the cooling dept. is sometimes necessary. Robert Crawford On Friday 22 July 2005 03:31 pm, Joseph wrote: On Fri, 2005-07-22 at 10:00 -0400, Robert Crawford wrote: I have an old IDE drive, maybe I can squeeze Gentoo on it for testing. Bob has a good idea too regarding the CPU compound under the heat-sink but at CPU temp. 39C I don't see how that could cause any problem. -- #Joseph No matter what the temp sensors are reading, your problem definitely sounds like it's heat related. Temp sensor readings can, and often are not accurate, sometimes to an amazing degree. I don't know what type of sensor your cpu uses, but if it's the type under the cpu, it might not be in good contact with the cpu itself, thus giving false readings, I wouldn't be surprised if your cpu temp was really over 50C. In my experience, temps over 50C. with AMD 32bit cpus start giving problems like this, no matter what AMD says about it. Seeing as how you have an AMD 64, I'm not sure about the sensor type- all I'm saying is that the readings can vary wildly, and are not to be trusted, especially considering your current problems. Robert Crawford Now I tend to lean towards your solution. It could be heat related. I've disable on-board network and add standard PCI card on a IRQ3 (separate IRQ); so the SATA controller has its own IRQ as well. The computer freesed once during emerge sync without any error message, and with error during emerge Apache. -- #Joseph -- gentoo-user@gentoo.org mailing list
[gentoo-user] Re: update - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler
[snip] No, I still have the same Sata Drive is just I'm playing with IRQ assignment and configuration. I've changed to BIOS PnP to YES, so my skge (network controller) and libata (Sata Controller are shifted to IRQ 10 But it makes me wonder both controllers on the Motherboard are different chips, so why do they share IRQ? Is there a way to shift them to a different IRQ since Linux control IRQ assignment now? A quick look through linux/Documentation/kernel-parameters.txt shows that many drivers support direct irq assignment. Also, linux/Documentation/pnp.txt may be of use. Considering the positive results that you've gotten so far, it seems like you may be on the right track here. It makes me less concerned about any possible overheating, but if you wanted to be paranoid about it, you could get another heat probe to double check the readings from the first one ;-). Zac Here is what I have done: 1.) Disable Network controller on the motherboard and install another one on PCI bus - this eliminate possible IRQ conflict. But it didn't help. 2.) Removed the heatsink clean it with 99% isopropyl alcohol and applied thin layer of new heatsink grease. Nothing helped, still getting that message: Kernel panic - not syncing: Aiee, killing interupt handler. Next option, is to try to remove SATA drive and try to install Gentoo on standard IDE drive; this would eliminate SCSI problem and/or buggy driver. Does anybody has any other solutions? -- #Joseph -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] Re: [Update] - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler
On Fri, 2005-07-22 at 16:29 -0400, Robert Crawford wrote: If you are using the thermal pad, tape, or grease that came with the stock heatsink, you might try using some arctic silver compound instead. It's good for a 3-5C. drop from the regular stuff. Sometimes even the AMD approved stock heatsinks don't do the job, and you might need to get a better one (assuming heat is the problem). I build a lot of computers, and with AMD cpus, overkill in the cooling dept. is sometimes necessary. Robert Crawford As I posted earlier: -- Here is what I have done: 1.) Disable Network controller on the motherboard and install another one on PCI bus - this eliminate possible IRQ conflict. But it didn't help. 2.) Removed the heatsink clean it with 99% isopropyl alcohol and applied thin layer of new heatsink grease. Nothing helped, still getting that message: Kernel panic - not syncing: Aiee, killing interupt handler. Next option, is to try to remove SATA drive and try to install Gentoo on standard IDE drive; this would eliminate SCSI problem and/or buggy driver. --- If the IDE drive will not solve the problem I'll try as you suggest that arctic silver compound (or just run that useless box only during Winter - here in Edmonton sometimes it gets down to -40C it might help :- I'm simply running out of ideas. [snip] -- #Joseph -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] Re: [Update] - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler
Joseph, Sorry- I haven't been reading this thread from the beginning, so I might have missed some of your first posts. If we eliminate heat as the problem (not saying we absolutely have), I'm starting to think it could be a misconfigured kernel, or kernel bug itself. What kernel are you you currently using? It might be worth a try compiling a new one, making sure all config options are correct for your system. One question (maybe you answered this before): Have you booted to a live cd like Knoppix or Slax, and the same problem occurs? If it does still happen, that would eliminate your kernel and/or hard drive as the source of the problem, and again focus back on the heat issue. If it doesn't happen, after being booted to a live cd for several hours of heavy usage, that would eliminate the heat issue. Robert -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] Re: update - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler
Joseph wrote: [snip] No, I still have the same Sata Drive is just I'm playing with IRQ assignment and configuration. I've changed to BIOS PnP to YES, so my skge (network controller) and libata (Sata Controller are shifted to IRQ 10 But it makes me wonder both controllers on the Motherboard are different chips, so why do they share IRQ? Is there a way to shift them to a different IRQ since Linux control IRQ assignment now? A quick look through linux/Documentation/kernel-parameters.txt shows that many drivers support direct irq assignment. Also, linux/Documentation/pnp.txt may be of use. Considering the positive results that you've gotten so far, it seems like you may be on the right track here. It makes me less concerned about any possible overheating, but if you wanted to be paranoid about it, you could get another heat probe to double check the readings from the first one ;-). Zac Here is what I have done: 1.) Disable Network controller on the motherboard and install another one on PCI bus - this eliminate possible IRQ conflict. But it didn't help. 2.) Removed the heatsink clean it with 99% isopropyl alcohol and applied thin layer of new heatsink grease. Nothing helped, still getting that message: Kernel panic - not syncing: Aiee, killing interupt handler. Next option, is to try to remove SATA drive and try to install Gentoo on standard IDE drive; this would eliminate SCSI problem and/or buggy driver. Does anybody has any other solutions? This isn't exactly a solution and its just a stab in the dark, but... You're using -march=k8, If I recall. I've read that this causes (or used to cause) problems for some people. I believe it had to do with poor support by some versions of gcc. I'm sure this is probably no longer the case, but I haven't heard one way or the other. I'm using -march=athlon64 without any trouble. I don't think you can change this flag after the initial installation, though. -- Pluralitas non est ponenda sine necessitate - W. of O. -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] Re: [Update] - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler
Robert Crawford wrote: Joseph, Sorry- I haven't been reading this thread from the beginning, so I might have missed some of your first posts. If we eliminate heat as the problem (not saying we absolutely have), I'm starting to think it could be a misconfigured kernel, or kernel bug itself. What kernel are you you currently using? It might be worth a try compiling a new one, making sure all config options are correct for your system. One question (maybe you answered this before): Have you booted to a live cd like Knoppix or Slax, and the same problem occurs? If it does still happen, that would eliminate your kernel and/or hard drive as the source of the problem, and again focus back on the heat issue. If it doesn't happen, after being booted to a live cd for several hours of heavy usage, that would eliminate the heat issue. Robert Excellent idea, but I'd suggest Kanotix 64 instead of Knoppix or Slax. Unless there are 64-bit versions of those that I don't know about. -- Pluralitas non est ponenda sine necessitate - W. of O. -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] Re: update - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler
[snip] Here is what I have done: 1.) Disable Network controller on the motherboard and install another one on PCI bus - this eliminate possible IRQ conflict. But it didn't help. 2.) Removed the heatsink clean it with 99% isopropyl alcohol and applied thin layer of new heatsink grease. Nothing helped, still getting that message: Kernel panic - not syncing: Aiee, killing interupt handler. 3.) Next option, is to try to remove SATA drive and try to install Gentoo on standard IDE drive; this would eliminate SCSI problem and/or buggy driver. Does anybody has any other solutions? This isn't exactly a solution and its just a stab in the dark, but... You're using -march=k8, If I recall. I've read that this causes (or used to cause) problems for some people. I believe it had to do with poor support by some versions of gcc. I'm sure this is probably no longer the case, but I haven't heard one way or the other. I'm using -march=athlon64 without any trouble. I don't think you can change this flag after the initial installation, though. -- Pluralitas non est ponenda sine necessitate - W. of O. Well, your were right, it was a stab in a dark and I missed. I've removed the Sata Drive put in some 15Gb IDE drive and started from scratch. I didn't even had a chance to install all from the handbook when I got to the point: 9.d. Optional: File Indexing and try to emerge slocate and of cause got a Kernel panic - not syncing: Aiee, killing interupt handler. I've eliminated option 1.) 2.) and 3.) IDE drive (see above): So my next option to try would be as you suggest, I will try: 4.) -march-athlon64 in make.conf instead of -march=k8 as handbook suggest (I'll start from scratch again). 5.) arctic silver compound as per Robert suggestion 6.) replace the mother-board 7.) replace the CPU 8.) go to Windows XP : -) (I have to keep sense of humor to keep my sanity as I've been trying to solve this problem for a week). -- #Joseph -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] Re: [Update] - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler
On Fri, 2005-07-22 at 17:24 -0400, Robert Crawford wrote: Joseph, Sorry- I haven't been reading this thread from the beginning, so I might have missed some of your first posts. If we eliminate heat as the problem (not saying we absolutely have), I'm starting to think it could be a misconfigured kernel, or kernel bug itself. I'm just following handbook AMD64 instruction, so everything boots OK after installation (if I don't get the kernel panic). What kernel are you you currently using? It might be worth a try compiling a new one, making sure all config options are correct for your system. I'm on gentoo-source 2.6.12-r6 (the newest one) One question (maybe you answered this before): Have you booted to a live cd like Knoppix or Slax, and the same problem occurs? If it does still happen, that would eliminate your kernel and/or hard drive as the source of the problem, and again focus back on the heat issue. If it doesn't happen, after being booted to a live cd for several hours of heavy usage, that would eliminate the heat issue. It does not happen when I do some light stuff computing, only during compilation, when I'm emerging something. -- #Joseph -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] Re: update - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler
On Fri, 22 Jul 2005 17:52:09 -0600 Joseph [EMAIL PROTECTED] wrote: Does anybody has any other solutions? i There are a few tools that will allow you to do some diagnosing. These will isolate your harddrive and drive controllers. app-benchmarks/bonnie (2.0.6): Performance Test of Filesystem I/O using standard C library calls. app-benchmarks/bonnie++ (1.93c): Hard drive bottleneck testing benchmark suite. If is is the motherboard, it should fall over pretty quick. Another tool I like is - app-benchmarks/stress (0.18.6): Imposes stressful loads on different aspects of the system. You'll have to add - app-benchmarks/stress x86, to your /etc/portage/package.keywords as they don't have the amd64 keyword in the ebuild. It builds and runs fine. Stress allows you to load all or parts of the system up for a defined period of time. It's even possible to run the system out of resources. It's a real nice test of system stabilty. All except the Xserver and that's easy to add by running 3 of the rss-glx screensavers from a term while running stress. And if you make the virtual memory component large enough at runtime, the system will start swapping. This line will get the load up to about 20 and cause about 500MB of swapping to occur on a 1P amd64 system with 1 GB of main memory - stress --cpu 16 --io 4 --vm 2 --vm-bytes 1024M --timeout 60s -d 2 Change the timeout to be around 5 minutes or 600 seconds. Get a tail -f /var/log/messages or use root-tail. And get a top running in another term. Bob - -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] Re: [Update] - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler
On Friday 22 July 2005 07:57 pm, Joseph wrote: On Fri, 2005-07-22 at 17:24 -0400, Robert Crawford wrote: Joseph, Sorry- I haven't been reading this thread from the beginning, so I might have missed some of your first posts. If we eliminate heat as the problem (not saying we absolutely have), I'm starting to think it could be a misconfigured kernel, or kernel bug itself. I'm just following handbook AMD64 instruction, so everything boots OK after installation (if I don't get the kernel panic). What kernel are you you currently using? It might be worth a try compiling a new one, making sure all config options are correct for your system. I'm on gentoo-source 2.6.12-r6 (the newest one) One question (maybe you answered this before): Have you booted to a live cd like Knoppix or Slax, and the same problem occurs? If it does still happen, that would eliminate your kernel and/or hard drive as the source of the problem, and again focus back on the heat issue. If it doesn't happen, after being booted to a live cd for several hours of heavy usage, that would eliminate the heat issue. It does not happen when I do some light stuff computing, only during compilation, when I'm emerging something. -- #Joseph OK- if it doesn't happen during light computing stuff, and only with very cpu intensive stuff like compiling, I feel virtually certain it is a cpu heat issue. IMHO, there's not really any other reasonable explanation. Have you investigated the cpu voltage setting in the bios (if your Asus board has one to adjust it)? It's probably set at default for the cpu, but if it's set too high, that will cause cpu overheating, especially with a borderline heatsink/fan. I'm not sure what the default voltage of your cpu should be- look it up on the net, or it is coded on the cpu numbers, if you have the code. For example, the Athlon 64 3200+ has a default operating voltage of 1.50 volts. This link shows an example of the cpu codes written on the chips, and what the markings mean. http://www.digital-daily.com/cpu/amd-athlon64/ Robert -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] Re: update - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler
[snip] Thank you for suggestion, I'm re-installing Gentoo and definitely run these tools. For sure I have some hardware memory problem as my latest error message is: Uhhuh. NMI received. Dazed and confused, but trying to continue You probably have a hardware problem with your RAM chips NMI: IOCK error (debug interrupt?) CPU 0 Modules linked in: evdev via_rhine mii parport_pc parport ahci sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil sata_promis libata sbp2 ohci1934 ieee1394 usb_storage ohci_hcd uhci_hcd ehci_hcd usbcore Pid: 5626, comm: rsync Not tainted 2.6.11-gentoo-r3-k8 RIP: 0010:[. Though I've run memtest86 two day ago and 17-passes went without any errors. -- #Joseph There are a few tools that will allow you to do some diagnosing. These will isolate your harddrive and drive controllers. app-benchmarks/bonnie (2.0.6): Performance Test of Filesystem I/O using standard C library calls. app-benchmarks/bonnie++ (1.93c): Hard drive bottleneck testing benchmark suite. If is is the motherboard, it should fall over pretty quick. Another tool I like is - app-benchmarks/stress (0.18.6): Imposes stressful loads on different aspects of the system. You'll have to add - app-benchmarks/stress x86, to your /etc/portage/package.keywords as they don't have the amd64 keyword in the ebuild. It builds and runs fine. Stress allows you to load all or parts of the system up for a defined period of time. It's even possible to run the system out of resources. It's a real nice test of system stabilty. All except the Xserver and that's easy to add by running 3 of the rss-glx screensavers from a term while running stress. And if you make the virtual memory component large enough at runtime, the system will start swapping. This line will get the load up to about 20 and cause about 500MB of swapping to occur on a 1P amd64 system with 1 GB of main memory - stress --cpu 16 --io 4 --vm 2 --vm-bytes 1024M --timeout 60s -d 2 Change the timeout to be around 5 minutes or 600 seconds. Get a tail -f /var/log/messages or use root-tail. And get a top running in another term. Bob - -- gentoo-user@gentoo.org mailing list