Re: [gentoo-user] Re: [Update] - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler

2005-07-24 Thread Neil Bothwick
On Fri, 22 Jul 2005 21:32:19 -0400, Robert Crawford wrote:

 OK- if it doesn't happen during light computing stuff, and only with
 very cpu intensive stuff like compiling, I feel virtually certain it is
 a cpu heat issue.  IMHO, there's not really any other reasonable
 explanation. 

A duff power supply? That would also tend to suffer under heavy load.


-- 
Neil Bothwick

I must have slipped a disk; my pack hurts.


pgpBoIDZfG8Sh.pgp
Description: PGP signature


Re: [gentoo-user] Re: update - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler

2005-07-23 Thread Zac Medico

Joseph wrote:

[snip]

Thank you for suggestion, I'm re-installing Gentoo and definitely run
these tools.

For sure I have some hardware memory problem as my latest error
message is:

Uhhuh. NMI received. Dazed and confused, but trying to continue 
You probably have a hardware problem with your RAM chips

NMI: IOCK error (debug interrupt?)
CPU 0
Modules linked in: evdev via_rhine mii parport_pc parport ahci sata_uli
sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil sata_promis libata
sbp2 ohci1934 ieee1394 usb_storage ohci_hcd uhci_hcd ehci_hcd usbcore
Pid: 5626, comm: rsync Not tainted 2.6.11-gentoo-r3-k8
RIP: 0010:[.

Though I've run memtest86 two day ago and 17-passes went without any
errors.
 


How may ram modules do you have, any spares?  Maybe you can stress test them 
one at a time.

Zac
--
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] Re: update - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler

2005-07-23 Thread Joseph
  Though I've run memtest86 two day ago and 17-passes went without any
  errors.
   
 
 How may ram modules do you have, any spares?  Maybe you can stress test them 
 one at a time.
 
 Zac

I have two memory sticks; and yes I run them individually as well.
Though Francesco made good pointer.  I'll borrow two sticks from another
machine and try to run it as well.
 quote--
Sometimes memtest doesn't stress enough the hardware, see:
http://people.redhat.com/dledford/memtest.html
--- end quote-
along with Bob, who pointed me to some good gentoo hardware utility the
I possibly will be able to run tomorrow.
It is definitely a hardware problem.
I'll let you know once I find out.
 
-- 
#Joseph
-- 
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] Re: update - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler

2005-07-23 Thread Zac Medico

Joseph wrote:

Though I've run memtest86 two day ago and 17-passes went without any
errors.



How may ram modules do you have, any spares?  Maybe you can stress test them 
one at a time.

Zac



I have two memory sticks; and yes I run them individually as well.
Though Francesco made good pointer.  I'll borrow two sticks from another
machine and try to run it as well.
 quote--
Sometimes memtest doesn't stress enough the hardware, see:
http://people.redhat.com/dledford/memtest.html
--- end quote-
along with Bob, who pointed me to some good gentoo hardware utility the
I possibly will be able to run tomorrow.
It is definitely a hardware problem.
I'll let you know once I find out.
 


Lots of good advice.  I've been taking notes ;-).

Zac
--
gentoo-user@gentoo.org mailing list



[gentoo-user] Re: [Update] - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler

2005-07-22 Thread Joseph
On Fri, 2005-07-22 at 10:00 -0400, Robert Crawford wrote:
   I have an old IDE drive, maybe I can squeeze Gentoo on it for testing.
   Bob has a good idea too regarding the CPU compound under the heat-sink
   but at CPU temp. 39C I don't see how that could cause any problem.
  
   --
   #Joseph
 
 No matter what the temp sensors are reading, your problem definitely sounds 
 like it's heat related. Temp sensor readings can, and often are not accurate, 
 sometimes to an amazing degree. I don't know what type of sensor your cpu 
 uses, but if it's the type under the cpu, it might not be in good contact 
 with the cpu itself, thus giving false readings, I wouldn't be surprised if 
 your cpu temp was really  over 50C.   In my experience, temps over 50C. with 
 AMD 32bit cpus start giving problems like this, no matter what AMD says about 
 it. Seeing as how you have an AMD 64, I'm not sure about the sensor type- all 
 I'm saying is that the readings can vary wildly, and are not to be trusted, 
 especially considering your current problems.
 
 Robert Crawford

Now I tend to lean towards your solution.
It could be heat related.
I've disable on-board network and add standard PCI card on a IRQ3
(separate IRQ); so the SATA controller has its own IRQ as well.

The computer freesed once during emerge sync without any error
message, and with error during emerge Apache.

-- 
#Joseph
-- 
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] Re: [Update] - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler

2005-07-22 Thread Robert Crawford
If you are using the thermal pad, tape, or grease  that came with the stock  
heatsink, you might try using some arctic silver compound instead. It's good 
for a 3-5C. drop from the regular stuff. Sometimes even the AMD approved 
stock heatsinks don't do the job, and you might need to get a better one 
(assuming heat is the problem). I build a lot of computers, and with AMD 
cpus, overkill in the cooling dept. is sometimes necessary.

Robert Crawford

On Friday 22 July 2005 03:31 pm, Joseph wrote:
 On Fri, 2005-07-22 at 10:00 -0400, Robert Crawford wrote:
I have an old IDE drive, maybe I can squeeze Gentoo on it for
testing. Bob has a good idea too regarding the CPU compound under the
heat-sink but at CPU temp. 39C I don't see how that could cause any
problem.
   
--
#Joseph
 
  No matter what the temp sensors are reading, your problem definitely
  sounds like it's heat related. Temp sensor readings can, and often are
  not accurate, sometimes to an amazing degree. I don't know what type of
  sensor your cpu uses, but if it's the type under the cpu, it might not be
  in good contact with the cpu itself, thus giving false readings, I
  wouldn't be surprised if your cpu temp was really  over 50C.   In my
  experience, temps over 50C. with AMD 32bit cpus start giving problems
  like this, no matter what AMD says about it. Seeing as how you have an
  AMD 64, I'm not sure about the sensor type- all I'm saying is that the
  readings can vary wildly, and are not to be trusted, especially
  considering your current problems.
 
  Robert Crawford

 Now I tend to lean towards your solution.
 It could be heat related.
 I've disable on-board network and add standard PCI card on a IRQ3
 (separate IRQ); so the SATA controller has its own IRQ as well.

 The computer freesed once during emerge sync without any error
 message, and with error during emerge Apache.

 --
 #Joseph
-- 
gentoo-user@gentoo.org mailing list



[gentoo-user] Re: update - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler

2005-07-22 Thread Joseph
[snip]

  No, I still have the same Sata Drive is just I'm playing with IRQ
  assignment and configuration.
  I've changed to BIOS PnP to YES, so my skge (network controller) and
  libata (Sata Controller are shifted to  IRQ 10
  
  But it makes me wonder both controllers on the Motherboard are different
  chips, so why do they share IRQ?  Is there a way to shift them to a
  different IRQ since Linux control IRQ assignment now? 
  
 
 A quick look through linux/Documentation/kernel-parameters.txt shows that 
 many drivers support direct irq assignment.  Also, 
 linux/Documentation/pnp.txt may be of use.
 
 Considering the positive results that you've gotten so far, it seems like you 
 may be on the right track here.  It makes me less concerned about any 
 possible overheating, but if you wanted to be paranoid about it, you could 
 get another heat probe to double check the readings from the first one ;-).
 
 Zac

Here is what I have done:
1.) Disable Network controller on the motherboard and install another
one on PCI bus - this eliminate possible IRQ conflict.
But it didn't help.

2.) Removed the heatsink clean it with 99% isopropyl alcohol and applied
thin layer of new heatsink grease.

Nothing helped, still getting that message:
Kernel panic - not syncing: Aiee, killing interupt handler.

Next option, is to try to remove SATA drive and try to install Gentoo on
standard IDE drive; this would eliminate SCSI problem and/or buggy
driver.

Does anybody has any other solutions?

-- 
#Joseph
-- 
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] Re: [Update] - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler

2005-07-22 Thread Joseph
On Fri, 2005-07-22 at 16:29 -0400, Robert Crawford wrote:
 If you are using the thermal pad, tape, or grease  that came with the stock  
 heatsink, you might try using some arctic silver compound instead. It's good 
 for a 3-5C. drop from the regular stuff. Sometimes even the AMD approved 
 stock heatsinks don't do the job, and you might need to get a better one 
 (assuming heat is the problem). I build a lot of computers, and with AMD 
 cpus, overkill in the cooling dept. is sometimes necessary.
 
 Robert Crawford
 

As I posted earlier:
--
Here is what I have done:
1.) Disable Network controller on the motherboard and install another
one on PCI bus - this eliminate possible IRQ conflict.
But it didn't help.

2.) Removed the heatsink clean it with 99% isopropyl alcohol and applied
thin layer of new heatsink grease.

Nothing helped, still getting that message:
Kernel panic - not syncing: Aiee, killing interupt handler.

Next option, is to try to remove SATA drive and try to install Gentoo on
standard IDE drive; this would eliminate SCSI problem and/or buggy
driver.
---

If the IDE drive will not solve the problem I'll try as you suggest that
arctic silver compound (or just run that useless box only during
Winter - here in Edmonton sometimes it gets down to -40C it might
help :- 

I'm simply running out of ideas.

[snip]
-- 
#Joseph
-- 
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] Re: [Update] - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler

2005-07-22 Thread Robert Crawford
Joseph,
Sorry- I haven't been reading this thread from the beginning, so I might have 
missed some of your first posts.

If we eliminate heat as the problem (not saying we absolutely have), I'm 
starting to think it could be a misconfigured kernel, or kernel bug itself.

What kernel are you you currently using?  It might be worth a try compiling a 
new one, making sure all config options are correct for your system.

One question (maybe you answered this before):
Have you booted to a live cd like Knoppix or Slax, and the same problem 
occurs?  If it does still happen, that would eliminate your kernel and/or 
hard drive as the source of the problem, and again focus back on the heat 
issue.

If it doesn't happen, after being booted to a live cd for several hours of 
heavy usage, that would eliminate the heat issue. 

Robert


-- 
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] Re: update - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler

2005-07-22 Thread Matt Randolph

Joseph wrote:


[snip]

 


No, I still have the same Sata Drive is just I'm playing with IRQ
assignment and configuration.
I've changed to BIOS PnP to YES, so my skge (network controller) and
libata (Sata Controller are shifted to  IRQ 10

But it makes me wonder both controllers on the Motherboard are different
chips, so why do they share IRQ?  Is there a way to shift them to a
different IRQ since Linux control IRQ assignment now? 

 


A quick look through linux/Documentation/kernel-parameters.txt shows that many 
drivers support direct irq assignment.  Also, linux/Documentation/pnp.txt may 
be of use.

Considering the positive results that you've gotten so far, it seems like you 
may be on the right track here.  It makes me less concerned about any possible 
overheating, but if you wanted to be paranoid about it, you could get another 
heat probe to double check the readings from the first one ;-).

Zac
   



Here is what I have done:
1.) Disable Network controller on the motherboard and install another
one on PCI bus - this eliminate possible IRQ conflict.
But it didn't help.

2.) Removed the heatsink clean it with 99% isopropyl alcohol and applied
thin layer of new heatsink grease.

Nothing helped, still getting that message:
Kernel panic - not syncing: Aiee, killing interupt handler.

Next option, is to try to remove SATA drive and try to install Gentoo on
standard IDE drive; this would eliminate SCSI problem and/or buggy
driver.

Does anybody has any other solutions?

 


This isn't exactly a solution and its just a stab in the dark, but...

You're using -march=k8, If I recall.  I've read that this causes (or 
used to cause) problems for some people.  I believe it had to do with 
poor support by some versions of gcc.  I'm sure this is probably no 
longer the case, but I haven't heard one way or the other.  I'm using 
-march=athlon64 without any trouble.  I don't think you can change this 
flag after the initial installation, though. 


--
Pluralitas non est ponenda sine necessitate - W. of O.

--
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] Re: [Update] - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler

2005-07-22 Thread Matt Randolph

Robert Crawford wrote:


Joseph,
Sorry- I haven't been reading this thread from the beginning, so I might have 
missed some of your first posts.


If we eliminate heat as the problem (not saying we absolutely have), I'm 
starting to think it could be a misconfigured kernel, or kernel bug itself.


What kernel are you you currently using?  It might be worth a try compiling a 
new one, making sure all config options are correct for your system.


One question (maybe you answered this before):
Have you booted to a live cd like Knoppix or Slax, and the same problem 
occurs?  If it does still happen, that would eliminate your kernel and/or 
hard drive as the source of the problem, and again focus back on the heat 
issue.


If it doesn't happen, after being booted to a live cd for several hours of 
heavy usage, that would eliminate the heat issue. 


Robert


 

Excellent idea, but I'd suggest Kanotix 64 instead of Knoppix or Slax.  
Unless there are 64-bit versions of those that I don't know about.


--
Pluralitas non est ponenda sine necessitate - W. of O.

--
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] Re: update - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler

2005-07-22 Thread Joseph
[snip]

 Here is what I have done:
 1.) Disable Network controller on the motherboard and install another
 one on PCI bus - this eliminate possible IRQ conflict.
 But it didn't help.
 
 2.) Removed the heatsink clean it with 99% isopropyl alcohol and applied
 thin layer of new heatsink grease.
 
 Nothing helped, still getting that message:
 Kernel panic - not syncing: Aiee, killing interupt handler.
 
 3.) Next option, is to try to remove SATA drive and try to install Gentoo on
 standard IDE drive; this would eliminate SCSI problem and/or buggy
 driver.
 
 Does anybody has any other solutions?
 
   
 
 This isn't exactly a solution and its just a stab in the dark, but...
 
 You're using -march=k8, If I recall.  I've read that this causes (or 
 used to cause) problems for some people.  I believe it had to do with 
 poor support by some versions of gcc.  I'm sure this is probably no 
 longer the case, but I haven't heard one way or the other.  I'm using 
 -march=athlon64 without any trouble.  I don't think you can change this 
 flag after the initial installation, though. 
 
 -- 
 Pluralitas non est ponenda sine necessitate - W. of O.
 

Well, your were right, it was a stab in a dark and I missed.
I've removed the Sata Drive put in some 15Gb IDE drive and started from
scratch.
I didn't even had a chance to install all from the handbook when I got
to the point:
9.d. Optional: File Indexing and try to emerge slocate and of cause
got a 
Kernel panic - not syncing: Aiee, killing interupt handler.

I've eliminated option 1.)   2.)  and 3.) IDE drive (see above):

So my next option to try would be as you suggest, I will try:
4.)  -march-athlon64  in make.conf  instead of -march=k8 as handbook
suggest (I'll start from scratch again).
5.)  arctic silver compound as per Robert suggestion
6.) replace the mother-board 
7.) replace the CPU
8.) go to Windows XP : -)  (I have to keep sense of humor to keep my
sanity as I've been trying to solve this problem for a week).

-- 
#Joseph
-- 
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] Re: [Update] - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler

2005-07-22 Thread Joseph
On Fri, 2005-07-22 at 17:24 -0400, Robert Crawford wrote:
 Joseph,
 Sorry- I haven't been reading this thread from the beginning, so I might have 
 missed some of your first posts.
 
 If we eliminate heat as the problem (not saying we absolutely have), I'm 
 starting to think it could be a misconfigured kernel, or kernel bug itself.

I'm just following handbook AMD64 instruction, so everything boots OK
after installation (if I don't get the kernel panic).

 What kernel are you you currently using?  It might be worth a try compiling a 
 new one, making sure all config options are correct for your system.

I'm on gentoo-source 2.6.12-r6 (the newest one)

 One question (maybe you answered this before):
 Have you booted to a live cd like Knoppix or Slax, and the same problem 
 occurs?  If it does still happen, that would eliminate your kernel and/or 
 hard drive as the source of the problem, and again focus back on the heat 
 issue.
 
 If it doesn't happen, after being booted to a live cd for several hours of 
 heavy usage, that would eliminate the heat issue. 

It does not happen when I do some light stuff computing, only during
compilation, when I'm emerging something.

-- 
#Joseph
-- 
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] Re: update - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler

2005-07-22 Thread Bob Sanders
On Fri, 22 Jul 2005 17:52:09 -0600
Joseph [EMAIL PROTECTED] wrote:


  Does anybody has any other solutions?
  i

There are a few tools that will allow you to do some diagnosing.

These will isolate your harddrive and drive controllers.

 app-benchmarks/bonnie (2.0.6):  Performance Test of Filesystem I/O using 
standard C library calls.
 app-benchmarks/bonnie++ (1.93c):  Hard drive bottleneck testing benchmark 
suite.

If is is the motherboard, it should fall over pretty quick.

Another tool I like is - 

  app-benchmarks/stress (0.18.6):  Imposes stressful loads on different aspects 
of the system.

You'll have to add - app-benchmarks/stress x86, to your 
/etc/portage/package.keywords
as they don't have the amd64 keyword in the ebuild.  It builds and runs fine. 

Stress allows you to load all or parts of the system up for a defined period of 
time.  It's
even possible to run the system out of resources.  It's a real nice test of 
system stabilty.
All except the Xserver and that's easy to add by running 3 of the rss-glx 
screensavers
from a term while running stress.  And if you make the virtual memory component 
large
enough at runtime, the system will start swapping.

This line will get the load up to about 20 and cause about 500MB of swapping
to occur on a 1P amd64 system with 1 GB of main memory -

  stress --cpu 16 --io 4 --vm 2 --vm-bytes 1024M --timeout 60s -d 2

Change the timeout to be around 5 minutes or 600 seconds.  Get a tail -f 
/var/log/messages
or use root-tail.  And get a top running in another term.

Bob
-  
-- 
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] Re: [Update] - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler

2005-07-22 Thread Robert Crawford
On Friday 22 July 2005 07:57 pm, Joseph wrote:
 On Fri, 2005-07-22 at 17:24 -0400, Robert Crawford wrote:
  Joseph,
  Sorry- I haven't been reading this thread from the beginning, so I might
  have missed some of your first posts.
 
  If we eliminate heat as the problem (not saying we absolutely have), I'm
  starting to think it could be a misconfigured kernel, or kernel bug
  itself.

 I'm just following handbook AMD64 instruction, so everything boots OK
 after installation (if I don't get the kernel panic).

  What kernel are you you currently using?  It might be worth a try
  compiling a new one, making sure all config options are correct for your
  system.

 I'm on gentoo-source 2.6.12-r6 (the newest one)

  One question (maybe you answered this before):
  Have you booted to a live cd like Knoppix or Slax, and the same problem
  occurs?  If it does still happen, that would eliminate your kernel and/or
  hard drive as the source of the problem, and again focus back on the heat
  issue.
 
  If it doesn't happen, after being booted to a live cd for several hours
  of heavy usage, that would eliminate the heat issue.

 It does not happen when I do some light stuff computing, only during
 compilation, when I'm emerging something.

 --
 #Joseph

OK- if it doesn't happen during light computing stuff, and only with very cpu 
intensive stuff like compiling, I feel virtually certain it is a cpu heat 
issue.  IMHO, there's not really any other reasonable explanation. 

Have you investigated the cpu voltage setting in the bios (if your Asus board 
has one to adjust it)?  It's probably set at default for the cpu, but if it's 
set too high, that will cause cpu overheating, especially with a borderline 
heatsink/fan.  I'm not sure what the default voltage of your cpu should be- 
look it up on the net, or it is coded on the cpu numbers, if you have the 
code. For example, the Athlon 64 3200+ has a default operating voltage of 
1.50 volts. This link shows an example of the cpu codes written on the chips, 
and what the markings mean.

http://www.digital-daily.com/cpu/amd-athlon64/

Robert
-- 
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] Re: update - was 1.) Kernel panic - not syncing: Aiee, killing interupt handler

2005-07-22 Thread Joseph
[snip]

Thank you for suggestion, I'm re-installing Gentoo and definitely run
these tools.

For sure I have some hardware memory problem as my latest error
message is:

Uhhuh. NMI received. Dazed and confused, but trying to continue 
You probably have a hardware problem with your RAM chips
NMI: IOCK error (debug interrupt?)
CPU 0
Modules linked in: evdev via_rhine mii parport_pc parport ahci sata_uli
sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil sata_promis libata
sbp2 ohci1934 ieee1394 usb_storage ohci_hcd uhci_hcd ehci_hcd usbcore
Pid: 5626, comm: rsync Not tainted 2.6.11-gentoo-r3-k8
RIP: 0010:[.

Though I've run memtest86 two day ago and 17-passes went without any
errors.
 
-- 
#Joseph

 
 There are a few tools that will allow you to do some diagnosing.
 
 These will isolate your harddrive and drive controllers.
 
  app-benchmarks/bonnie (2.0.6):  Performance Test of Filesystem I/O using 
 standard C library calls.
  app-benchmarks/bonnie++ (1.93c):  Hard drive bottleneck testing benchmark 
 suite.
 
 If is is the motherboard, it should fall over pretty quick.
 
 Another tool I like is - 
 
   app-benchmarks/stress (0.18.6):  Imposes stressful loads on different 
 aspects of the system.
 
 You'll have to add - app-benchmarks/stress x86, to your 
 /etc/portage/package.keywords
 as they don't have the amd64 keyword in the ebuild.  It builds and runs fine. 
 
 Stress allows you to load all or parts of the system up for a defined period 
 of time.  It's
 even possible to run the system out of resources.  It's a real nice test of 
 system stabilty.
 All except the Xserver and that's easy to add by running 3 of the rss-glx 
 screensavers
 from a term while running stress.  And if you make the virtual memory 
 component large
 enough at runtime, the system will start swapping.
 
 This line will get the load up to about 20 and cause about 500MB of swapping
 to occur on a 1P amd64 system with 1 GB of main memory -
 
   stress --cpu 16 --io 4 --vm 2 --vm-bytes 1024M --timeout 60s -d 2
 
 Change the timeout to be around 5 minutes or 600 seconds.  Get a tail -f 
 /var/log/messages
 or use root-tail.  And get a top running in another term.
 
 Bob
 -  

-- 
gentoo-user@gentoo.org mailing list