Re: kernel bug in 2.6.25-14.fc9.x86_64 for r8101E NIC?
Alan Cox wrote: - GSI 17 (level, low) - IRQ 17 Sep 24 09:02:22 localhost kernel: BUG: unable to handle kernel NULL pointer dereference at 0208 What is needed to debug this is all the stuff after the BUG: line - the numbers and trace information. If you can capture that then it should be easy to work out if what you are seeing is a fixed bug and a kernel update will help. Alan Will do. I renamed the old driver r8169 to offr8169, rebooted and was able to install the r8101 driver. (There was typo in the original post indicating it was the r8108 driver.) I'll re-generate the error messages and post the results on Friday. (I deleted the old /var/log/messages file since there was to much noise in it so I need to reboot to regenerate the errors but my machine is currently updating Fedora.) The LAN, wireless and sound are working so I'm happy camper. -- Article. VI. Clause 3 of the constitution of the United States states: The Senators and Representatives before mentioned, and the Members of the several State Legislatures, and all executive and judicial Officers, both of the United States and of the several States, shall be bound by Oath or Affirmation, to support this Constitution; but no religious Test shall ever be required as a Qualification to any Office or public Trust under the United States. -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: kernel bug?
On Wed, Sep 24, 2008 at 9:18 AM, Agile Aspect [EMAIL PROTECTED] wrote: Hi - I have a Toshiba Satellite L355D-S7815 laptop with r8108E (10/100 Mbits/sec) NIC which I'm trying to get working. There's appears to be a bug in the default kernel (2.6.25-14.fc9.x86_64) for Fedora 9: Sep 24 09:02:22 localhost kernel: r8169 Gigabit Ethernet driver 2.2LK-NAPI loaded Sep 24 09:02:22 localhost kernel: ACPI: PCI Interrupt :04:00.0[A] - GSI 17 (level, low) - IRQ 17 Sep 24 09:02:22 localhost kernel: BUG: unable to handle kernel NULL pointer dereference at 0208 I downloaded the r8108E driver from RealTek which I can build and install, but the problem is I can *NOT* rmmod r8169 (ERROR: Module in use) or insmod r8108 (insmod: error inserting 'r8101.ko' : -1 Invalid module format) I am dead in the water. Any help would be greatly appreciated. Try stopping networking first; i.e. /etc/init.d/network stop, or boot into runlevel 1. -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: kernel bug in 2.6.25-14.fc9.x86_64 for r8101E NIC?
- GSI 17 (level, low) - IRQ 17 Sep 24 09:02:22 localhost kernel: BUG: unable to handle kernel NULL pointer dereference at 0208 What is needed to debug this is all the stuff after the BUG: line - the numbers and trace information. If you can capture that then it should be easy to work out if what you are seeing is a fixed bug and a kernel update will help. Alan -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: kernel bug?
Kam Leo wrote: On Wed, Sep 24, 2008 at 9:18 AM, Agile Aspect [EMAIL PROTECTED] wrote: Hi - I have a Toshiba Satellite L355D-S7815 laptop with r8108E (10/100 Mbits/sec) NIC which I'm trying to get working. There's appears to be a bug in the default kernel (2.6.25-14.fc9.x86_64) for Fedora 9: Sep 24 09:02:22 localhost kernel: r8169 Gigabit Ethernet driver 2.2LK-NAPI loaded Sep 24 09:02:22 localhost kernel: ACPI: PCI Interrupt :04:00.0[A] - GSI 17 (level, low) - IRQ 17 Sep 24 09:02:22 localhost kernel: BUG: unable to handle kernel NULL pointer dereference at 0208 I downloaded the r8108E driver from RealTek which I can build and install, but the problem is I can *NOT* rmmod r8169 (ERROR: Module in use) or insmod r8108 (insmod: error inserting 'r8101.ko' : -1 Invalid module format) I am dead in the water. Any help would be greatly appreciated. Try stopping networking first; i.e. /etc/init.d/network stop, or boot into runlevel 1. Thanks for the reply - I already tried it. I shutdown both /etc/init.d/network and /etc/init.d/NetworkManager and it didn't help. And, I get a kernel error regardless of where or not the LAN is turned on in the BIOS. -- Article. VI. Clause 3 of the constitution of the United States states: The Senators and Representatives before mentioned, and the Members of the several State Legislatures, and all executive and judicial Officers, both of the United States and of the several States, shall be bound by Oath or Affirmation, to support this Constitution; but no religious Test shall ever be required as a Qualification to any Office or public Trust under the United States. -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: Kernel bug or disk failure
Sam Varshavchik wrote, On 07/13/2008 10:51 AM: Chris Snook writes: Sam Varshavchik wrote: Every other week or so, I get a disk kicked out of my RAID, with this: Jul 6 04:05:38 commodore kernel: (scsi1:A:0:0): scsi1: device overrun (status 10) on 0:0:0 Jul 6 04:05:38 commodore kernel: Unexpected busfree in DT Data-in phase, 1 SCBs aborted, PRGMCNT == 0x22f Jul 6 04:05:38 commodore kernel: Dump Card State Begins Jul 6 04:05:38 commodore kernel: scsi1: Dumping Card State at program address 0x22d Mode 0x22 Jul 6 04:05:38 commodore kernel: Card was paused … followed by a rather dry dump of the HBA's registers. This is aic79xxx. This does not look like a disk error to me. I re-add the drive into the array, and rebuild with no downtime. SMART shows 0 in the defect list on this drive, and over the disk's lifetime 0 uncorrectable reads and 1 uncorrectable write -- but this kernel barf already happened 4-5 times now, and it's getting rather annoying. Looks more like a controller problem than a drive problem. Do you have a spare HBA to test? No, but I have one on order, now. I reseated the cable, that didn't help -- the card dumped again about 12 hours later, but it was, apparently, non-fatal because RAID did not degrade. May I suggest that, when it is convenient to do so, you: 1) reboot 2) Catch the scsi card ( Ctrl-A ) when the aic79xxx boot text shows up during bios operations. 3) set the speed of the scsi bus to that drive to a little slower. 4) if you get the fault or the drive is not recognized, repeat until you get a desired result (some drives do not work at ALL the speeds slower than it is rated at, Promise U160 rated array communicated only at 160, 80, 66, 16 6). I had to work with a Promise array which was a) a bit flaky even compared to it's twin in the other bay (too late to warranty either one when I arrived). b) had Promise's problem of not knowing how to do domain validation, so I had to turn that off (domain validation only made the arrays flake out sooner). c) could not work reliably above ~20MB/sec (write or read). d) dropping similar errors to what you have above in ~4-8 hours of operation. Slowing it down using the card's settings made it work reliably enough to get the job done. -- Todd Denniston Crane Division, Naval Surface Warfare Center (NSWC Crane) Harnessing the Power of Technology for the Warfighter -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Re: Kernel bug or disk failure
Todd Denniston writes: Sam Varshavchik wrote, On 07/13/2008 10:51 AM: Chris Snook writes: Sam Varshavchik wrote: Every other week or so, I get a disk kicked out of my RAID, with this: Jul 6 04:05:38 commodore kernel: (scsi1:A:0:0): scsi1: device overrun (status 10) on 0:0:0 Jul 6 04:05:38 commodore kernel: Unexpected busfree in DT Data-in phase, 1 SCBs aborted, PRGMCNT == 0x22f Jul 6 04:05:38 commodore kernel: Dump Card State Begins Jul 6 04:05:38 commodore kernel: scsi1: Dumping Card State at program address 0x22d Mode 0x22 Jul 6 04:05:38 commodore kernel: Card was paused … followed by a rather dry dump of the HBA's registers. This is aic79xxx. This does not look like a disk error to me. I re-add the drive into the array, and rebuild with no downtime. SMART shows 0 in the defect list on this drive, and over the disk's lifetime 0 uncorrectable reads and 1 uncorrectable write -- but this kernel barf already happened 4-5 times now, and it's getting rather annoying. Looks more like a controller problem than a drive problem. Do you have a spare HBA to test? No, but I have one on order, now. I reseated the cable, that didn't help -- the card dumped again about 12 hours later, but it was, apparently, non-fatal because RAID did not degrade. May I suggest that, when it is convenient to do so, you: 1) reboot 2) Catch the scsi card ( Ctrl-A ) when the aic79xxx boot text shows up during bios operations. 3) set the speed of the scsi bus to that drive to a little slower. 4) if you get the fault or the drive is not recognized, repeat until you get a desired result (some drives do not work at ALL the speeds slower than it is rated at, Promise U160 rated array communicated only at 160, 80, 66, 16 6). I'll try that if the replacement card still trips like this. These drives have been spinning away, 24x7, for a few years, with nary a hiccup. pgp2YG1KzW95V.pgp Description: PGP signature -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Re: Kernel bug or disk failure
Chris Snook writes: Sam Varshavchik wrote: Every other week or so, I get a disk kicked out of my RAID, with this: Jul 6 04:05:38 commodore kernel: (scsi1:A:0:0): scsi1: device overrun (status 10) on 0:0:0 Jul 6 04:05:38 commodore kernel: Unexpected busfree in DT Data-in phase, 1 SCBs aborted, PRGMCNT == 0x22f Jul 6 04:05:38 commodore kernel: Dump Card State Begins Jul 6 04:05:38 commodore kernel: scsi1: Dumping Card State at program address 0x22d Mode 0x22 Jul 6 04:05:38 commodore kernel: Card was paused … followed by a rather dry dump of the HBA's registers. This is aic79xxx. This does not look like a disk error to me. I re-add the drive into the array, and rebuild with no downtime. SMART shows 0 in the defect list on this drive, and over the disk's lifetime 0 uncorrectable reads and 1 uncorrectable write -- but this kernel barf already happened 4-5 times now, and it's getting rather annoying. Looks more like a controller problem than a drive problem. Do you have a spare HBA to test? No, but I have one on order, now. I reseated the cable, that didn't help -- the card dumped again about 12 hours later, but it was, apparently, non-fatal because RAID did not degrade. pgpr8dabRVZaG.pgp Description: PGP signature -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Re: Kernel bug or disk failure
Sam Varshavchik wrote: Every other week or so, I get a disk kicked out of my RAID, with this: Jul 6 04:05:38 commodore kernel: (scsi1:A:0:0): scsi1: device overrun (status 10) on 0:0:0 Jul 6 04:05:38 commodore kernel: Unexpected busfree in DT Data-in phase, 1 SCBs aborted, PRGMCNT == 0x22f Jul 6 04:05:38 commodore kernel: Dump Card State Begins Jul 6 04:05:38 commodore kernel: scsi1: Dumping Card State at program address 0x22d Mode 0x22 Jul 6 04:05:38 commodore kernel: Card was paused … followed by a rather dry dump of the HBA's registers. This is aic79xxx. This does not look like a disk error to me. I re-add the drive into the array, and rebuild with no downtime. SMART shows 0 in the defect list on this drive, and over the disk's lifetime 0 uncorrectable reads and 1 uncorrectable write -- but this kernel barf already happened 4-5 times now, and it's getting rather annoying. Looks more like a controller problem than a drive problem. Do you have a spare HBA to test? -- Chris -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list