Re: kernel bug in 2.6.25-14.fc9.x86_64 for r8101E NIC?

2008-09-25 Thread Agile Aspect
Alan Cox wrote:
- GSI
   
 17 (level, low) - IRQ 17
 Sep 24 09:02:22 localhost kernel: BUG: unable to handle kernel NULL
   pointer
 dereference at 0208
 

 What is needed to debug this is all the stuff after the BUG: line - the
 numbers and trace information. If you can capture that then it should be
 easy to work out if what you are seeing is a fixed bug and a kernel
 update will help.

 Alan

   
Will do.

I renamed the old driver r8169 to offr8169,
rebooted and was able to install the r8101 driver. 
(There was typo in the original post indicating it
was the r8108  driver.)

I'll re-generate the error messages and post the
results on Friday. (I deleted the old  /var/log/messages
file since there was to much noise in it so I need to
reboot to regenerate the errors but my machine is
currently updating Fedora.)

The LAN, wireless and sound are working so I'm
happy camper.

-- 
Article. VI. Clause 3 of the constitution of the United States states: 

The Senators and Representatives before mentioned, and the Members of 
the several State Legislatures, and all executive and judicial Officers, 
both of the United States and of the several States, shall be bound by 
Oath or Affirmation, to support this Constitution; but no religious Test 
shall ever be required as a Qualification to any Office or public Trust 
under the United States. 


-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: kernel bug?

2008-09-24 Thread Kam Leo
On Wed, Sep 24, 2008 at 9:18 AM, Agile Aspect [EMAIL PROTECTED] wrote:
 Hi - I have a Toshiba Satellite L355D-S7815 laptop with r8108E (10/100
 Mbits/sec)
 NIC which I'm trying to get working.

 There's appears to be a bug in the default kernel (2.6.25-14.fc9.x86_64)
 for Fedora 9:

   Sep 24 09:02:22 localhost kernel: r8169 Gigabit Ethernet driver
 2.2LK-NAPI loaded
   Sep 24 09:02:22 localhost kernel: ACPI: PCI Interrupt :04:00.0[A]
 - GSI 17 (level, low) - IRQ 17
   Sep 24 09:02:22 localhost kernel: BUG: unable to handle kernel NULL
 pointer dereference at 0208

 I downloaded the r8108E driver from RealTek which I can build and
 install, but  the
 problem is I can *NOT*

rmmod r8169   (ERROR: Module in use)

 or

insmod r8108  (insmod: error inserting 'r8101.ko' : -1 Invalid
 module format)

 I am dead in the water.

 Any help would be greatly appreciated.


Try stopping networking first; i.e. /etc/init.d/network stop, or
boot into runlevel 1.

-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: kernel bug in 2.6.25-14.fc9.x86_64 for r8101E NIC?

2008-09-24 Thread Alan Cox
   - GSI
 17 (level, low) - IRQ 17
 Sep 24 09:02:22 localhost kernel: BUG: unable to handle kernel NULL
   pointer
 dereference at 0208

What is needed to debug this is all the stuff after the BUG: line - the
numbers and trace information. If you can capture that then it should be
easy to work out if what you are seeing is a fixed bug and a kernel
update will help.

Alan

-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: kernel bug?

2008-09-24 Thread Agile Aspect
Kam Leo wrote:
 On Wed, Sep 24, 2008 at 9:18 AM, Agile Aspect [EMAIL PROTECTED] wrote:
   
 Hi - I have a Toshiba Satellite L355D-S7815 laptop with r8108E (10/100
 Mbits/sec)
 NIC which I'm trying to get working.

 There's appears to be a bug in the default kernel (2.6.25-14.fc9.x86_64)
 for Fedora 9:

   Sep 24 09:02:22 localhost kernel: r8169 Gigabit Ethernet driver
 2.2LK-NAPI loaded
   Sep 24 09:02:22 localhost kernel: ACPI: PCI Interrupt :04:00.0[A]
 - GSI 17 (level, low) - IRQ 17
   Sep 24 09:02:22 localhost kernel: BUG: unable to handle kernel NULL
 pointer dereference at 0208

 I downloaded the r8108E driver from RealTek which I can build and
 install, but  the
 problem is I can *NOT*

rmmod r8169   (ERROR: Module in use)

 or

insmod r8108  (insmod: error inserting 'r8101.ko' : -1 Invalid
 module format)

 I am dead in the water.

 Any help would be greatly appreciated.

 

 Try stopping networking first; i.e. /etc/init.d/network stop, or
 boot into runlevel 1.

   
Thanks for the reply - I already tried it.  I shutdown both
/etc/init.d/network
and /etc/init.d/NetworkManager and it didn't help.

And, I get a kernel error regardless of where or not the LAN is turned
on in
the BIOS.

-- 
Article. VI. Clause 3 of the constitution of the United States states: 

The Senators and Representatives before mentioned, and the Members of 
the several State Legislatures, and all executive and judicial Officers, 
both of the United States and of the several States, shall be bound by 
Oath or Affirmation, to support this Constitution; but no religious Test 
shall ever be required as a Qualification to any Office or public Trust 
under the United States. 


-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Kernel bug or disk failure

2008-07-14 Thread Todd Denniston

Sam Varshavchik wrote, On 07/13/2008 10:51 AM:

Chris Snook writes:


Sam Varshavchik wrote:

Every other week or so, I get a disk kicked out of my RAID, with this:

Jul  6 04:05:38 commodore kernel: (scsi1:A:0:0): scsi1: device 
overrun (status 10) on 0:0:0
Jul  6 04:05:38 commodore kernel: Unexpected busfree in DT Data-in 
phase, 1 SCBs aborted, PRGMCNT == 0x22f
Jul  6 04:05:38 commodore kernel:  Dump Card State 
Begins 
Jul  6 04:05:38 commodore kernel: scsi1: Dumping Card State at 
program address 0x22d Mode 0x22

Jul  6 04:05:38 commodore kernel: Card was paused

… followed by a rather dry dump of the HBA's registers. This is 
aic79xxx.


This does not look like a disk error to me. I re-add the drive into 
the array, and rebuild with no downtime. SMART shows 0 in the defect 
list on this drive, and over the disk's lifetime 0 uncorrectable 
reads and 1 uncorrectable write -- but this kernel barf already 
happened 4-5 times now, and it's getting rather annoying.




Looks more like a controller problem than a drive problem.  Do you 
have a spare HBA to test?


No, but I have one on order, now. I reseated the cable, that didn't help 
-- the card dumped again about 12 hours later, but it was, apparently, 
non-fatal because RAID did not degrade.




May I suggest that, when it is convenient to do so, you:
1) reboot
2) Catch the scsi card ( Ctrl-A ) when the aic79xxx boot text shows up during 
bios operations.

3) set the speed of the scsi bus to that drive to a little slower.
4) if you get the fault or the drive is not recognized, repeat until you get a 
desired result (some drives do not work at ALL the speeds slower than it is 
rated at, Promise U160 rated array communicated only at 160, 80, 66, 16  6).


I had to work with a Promise array which was
a) a bit flaky even compared to it's twin in the other bay (too late to 
warranty either one when I arrived).
b) had Promise's problem of not knowing how to do domain validation, so I had 
to turn that off (domain validation only made the arrays flake out sooner).

c) could not work reliably above ~20MB/sec (write or read).
d) dropping similar errors to what you have above in ~4-8 hours of operation.
Slowing it down using the card's settings made it work reliably enough to get 
the job done.


--
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list


Re: Kernel bug or disk failure

2008-07-14 Thread Sam Varshavchik

Todd Denniston writes:


Sam Varshavchik wrote, On 07/13/2008 10:51 AM:

Chris Snook writes:


Sam Varshavchik wrote:

Every other week or so, I get a disk kicked out of my RAID, with this:

Jul  6 04:05:38 commodore kernel: (scsi1:A:0:0): scsi1: device 
overrun (status 10) on 0:0:0
Jul  6 04:05:38 commodore kernel: Unexpected busfree in DT Data-in 
phase, 1 SCBs aborted, PRGMCNT == 0x22f
Jul  6 04:05:38 commodore kernel:  Dump Card State 
Begins 
Jul  6 04:05:38 commodore kernel: scsi1: Dumping Card State at 
program address 0x22d Mode 0x22

Jul  6 04:05:38 commodore kernel: Card was paused

… followed by a rather dry dump of the HBA's registers. This is 
aic79xxx.


This does not look like a disk error to me. I re-add the drive into 
the array, and rebuild with no downtime. SMART shows 0 in the defect 
list on this drive, and over the disk's lifetime 0 uncorrectable 
reads and 1 uncorrectable write -- but this kernel barf already 
happened 4-5 times now, and it's getting rather annoying.




Looks more like a controller problem than a drive problem.  Do you 
have a spare HBA to test?


No, but I have one on order, now. I reseated the cable, that didn't help 
-- the card dumped again about 12 hours later, but it was, apparently, 
non-fatal because RAID did not degrade.




May I suggest that, when it is convenient to do so, you:
1) reboot
2) Catch the scsi card ( Ctrl-A ) when the aic79xxx boot text shows up during 
bios operations.

3) set the speed of the scsi bus to that drive to a little slower.
4) if you get the fault or the drive is not recognized, repeat until you get a 
desired result (some drives do not work at ALL the speeds slower than it is 
rated at, Promise U160 rated array communicated only at 160, 80, 66, 16  6).


I'll try that if the replacement card still trips like this. These drives 
have been spinning away, 24x7, for a few years, with nary a hiccup.





pgp2YG1KzW95V.pgp
Description: PGP signature
-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

Re: Kernel bug or disk failure

2008-07-13 Thread Sam Varshavchik

Chris Snook writes:


Sam Varshavchik wrote:

Every other week or so, I get a disk kicked out of my RAID, with this:

Jul  6 04:05:38 commodore kernel: (scsi1:A:0:0): scsi1: device overrun 
(status 10) on 0:0:0
Jul  6 04:05:38 commodore kernel: Unexpected busfree in DT Data-in 
phase, 1 SCBs aborted, PRGMCNT == 0x22f
Jul  6 04:05:38 commodore kernel:  Dump Card State 
Begins 
Jul  6 04:05:38 commodore kernel: scsi1: Dumping Card State at program 
address 0x22d Mode 0x22

Jul  6 04:05:38 commodore kernel: Card was paused

… followed by a rather dry dump of the HBA's registers. This is aic79xxx.

This does not look like a disk error to me. I re-add the drive into the 
array, and rebuild with no downtime. SMART shows 0 in the defect list on 
this drive, and over the disk's lifetime 0 uncorrectable reads and 1 
uncorrectable write -- but this kernel barf already happened 4-5 times 
now, and it's getting rather annoying.




Looks more like a controller problem than a drive problem.  Do you have a spare 
HBA to test?


No, but I have one on order, now. I reseated the cable, that didn't help -- 
the card dumped again about 12 hours later, but it was, apparently, 
non-fatal because RAID did not degrade.




pgpr8dabRVZaG.pgp
Description: PGP signature
-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

Re: Kernel bug or disk failure

2008-07-11 Thread Chris Snook

Sam Varshavchik wrote:

Every other week or so, I get a disk kicked out of my RAID, with this:

Jul  6 04:05:38 commodore kernel: (scsi1:A:0:0): scsi1: device overrun 
(status 10) on 0:0:0
Jul  6 04:05:38 commodore kernel: Unexpected busfree in DT Data-in 
phase, 1 SCBs aborted, PRGMCNT == 0x22f
Jul  6 04:05:38 commodore kernel:  Dump Card State 
Begins 
Jul  6 04:05:38 commodore kernel: scsi1: Dumping Card State at program 
address 0x22d Mode 0x22

Jul  6 04:05:38 commodore kernel: Card was paused

… followed by a rather dry dump of the HBA's registers. This is aic79xxx.

This does not look like a disk error to me. I re-add the drive into the 
array, and rebuild with no downtime. SMART shows 0 in the defect list on 
this drive, and over the disk's lifetime 0 uncorrectable reads and 1 
uncorrectable write -- but this kernel barf already happened 4-5 times 
now, and it's getting rather annoying.




Looks more like a controller problem than a drive problem.  Do you have a spare 
HBA to test?


-- Chris

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list