Bug#519158: Still problems with already changed hardware
Here is some update on the problem... I have updated the system in the following ways as there were still problems with the drives... - i replaced the motherboard (Intel D945GCLF2) as the errors only occured on drive sda (connected to one of the two SATA ports on the board itself) - i replaced the SSD with 2 drives like the 3 drives on the SiI 3124 (all three drives never had smilar problems since the machine is running), so there are now two WDC WD5000ABPS-0 Rev: 02.0 connected directly to the motherboard I hoped this will solve the problem... But yesterday night again the SATA system hat trouble: -- snip Jun 18 06:46:37 atom kernel: [697701.292480] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 18 06:46:37 atom kernel: [697701.292520] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Jun 18 06:46:37 atom kernel: [697701.292523] res 51/04:00:0a:24:f9/00:00:00:00:00/a9 Emask 0x1 (device error) Jun 18 06:46:37 atom kernel: [697701.292583] ata1.00: status: { DRDY ERR } Jun 18 06:46:37 atom kernel: [697701.292604] ata1.00: error: { ABRT } Jun 18 06:46:38 atom kernel: [697701.316560] ata1.00: failed to read native max address (err_mask=0x1) Jun 18 06:46:38 atom kernel: [697701.316589] ata1.00: HPA support seems broken, skipping HPA handling Jun 18 06:46:38 atom kernel: [697701.828285] ata1.00: configured for UDMA/133 (device error ignored) Jun 18 06:46:38 atom kernel: [697701.828327] end_request: I/O error, dev sda, sector 59922239 Jun 18 06:46:38 atom kernel: [697701.828357] md: super_written gets error=-5, uptodate=0 Jun 18 06:46:38 atom kernel: [697701.828384] raid1: Disk failure on sda1, disabling device. Jun 18 06:46:38 atom kernel: [697701.828386] raid1: Operation continuing on 1 devices. Jun 18 06:46:38 atom kernel: [697701.828462] ata1: EH complete Jun 18 06:46:38 atom kernel: [697701.828655] sd 0:0:0:0: [sda] 976773168 512-byte hardware sectors: (500 GB/465 GiB) Jun 18 06:46:38 atom kernel: [697701.828769] sd 0:0:0:0: [sda] Write Protect is off Jun 18 06:46:38 atom kernel: [697701.828795] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Jun 18 06:46:38 atom kernel: [697701.828876] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jun 18 06:46:38 atom kernel: [697702.255140] RAID1 conf printout: Jun 18 06:46:38 atom kernel: [697702.255169] --- wd:1 rd:2 Jun 18 06:46:38 atom kernel: [697702.255191] disk 0, wo:0, o:1, dev:sdb1 Jun 18 06:46:38 atom kernel: [697702.255213] disk 1, wo:1, o:0, dev:sda1 Jun 18 06:46:38 atom kernel: [697702.260017] RAID1 conf printout: Jun 18 06:46:38 atom kernel: [697702.260040] --- wd:1 rd:2 Jun 18 06:46:38 atom kernel: [697702.260060] disk 0, wo:0, o:1, dev:sdb1 Jun 18 06:50:14 atom kernel: [697918.50] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Jun 18 06:50:14 atom kernel: [697918.92] ata1.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0 Jun 18 06:50:14 atom kernel: [697918.95] res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 18 06:50:14 atom kernel: [697918.000155] ata1.00: status: { DRDY } Jun 18 06:50:19 atom kernel: [697923.040022] ata1: link is slow to respond, please be patient (ready=0) Jun 18 06:50:24 atom kernel: [697928.024026] ata1: device not ready (errno=-16), forcing hardreset Jun 18 06:50:24 atom kernel: [697928.024061] ata1: soft resetting link Jun 18 06:50:29 atom kernel: [697933.220020] ata1: link is slow to respond, please be patient (ready=0) Jun 18 06:50:39 atom kernel: [697942.536144] ata1.00: qc timeout (cmd 0xec) Jun 18 06:50:39 atom kernel: [697942.536177] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) Jun 18 06:50:39 atom kernel: [697942.536204] ata1.00: revalidation failed (errno=-5) Jun 18 06:50:44 atom kernel: [697947.576020] ata1: link is slow to respond, please be patient (ready=0) Jun 18 06:50:49 atom kernel: [697952.560023] ata1: device not ready (errno=-16), forcing hardreset Jun 18 06:50:49 atom kernel: [697952.560061] ata1: soft resetting link Jun 18 06:50:58 atom kernel: [697961.476299] ata1.00: configured for UDMA/133 (device error ignored) Jun 18 06:50:58 atom kernel: [697961.476371] ata1: EH complete Jun 18 06:50:58 atom kernel: [697961.478921] sd 0:0:0:0: [sda] 976773168 512-byte hardware sectors: (500 GB/465 GiB) Jun 18 06:50:58 atom kernel: [697961.479078] sd 0:0:0:0: [sda] Write Protect is off Jun 18 06:50:58 atom kernel: [697961.479110] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Jun 18 06:50:58 atom kernel: [697961.479212] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jun 18 06:55:11 atom kernel: [698215.63] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Jun 18 06:55:11 atom kernel: [698215.000101] ata1.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0 Jun 18 06:55:11 atom kernel: [698215.000104] res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 18 06:55:11 atom kernel:
Bug#519158:
The ATA subsystem is working so far.. but with the new kernel a different problem showed up today: [608566.964023] [ cut here ] [608566.964054] WARNING: at net/sched/sch_generic.c:226 dev_watchdog +0xf6/0x17c() [608566.964092] NETDEV WATCHDOG: eth0 (r8169): transmit timed out [608566.964117] Modules linked in: tun nfsd auth_rpcgss exportfs nfs lockd nfs_acl sunrpc ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables ipv6 smsc47m192 hwmon_vid loop joydev evdev psmouse snd_pcm snd_timer snd soundcore snd_page_alloc serio_raw pcspkr rng_core i2c_i801 i2c_core iTCO_wdt asix usbnet button intel_agp agpgart ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod raid456 md_mod async_xor async_memcpy async_tx xor usbhid hid usb_storage sd_mod crc_t10dif ata_piix ata_generic sata_sil24 ehci_hcd libata ide_pci_generic scsi_mod ide_core uhci_hcd usbcore r8169 mii thermal processor fan thermal_sys [608566.964488] Pid: 0, comm: swapper Not tainted 2.6.28-1-686 #1 [608566.964512] Call Trace: [608566.964538] [c0126d36] warn_slowpath+0x5a/0x79 [608566.964566] [c01f85d4] __next_cpu+0x12/0x21 [608566.964591] [c011f888] find_busiest_group+0x307/0x78f [608566.964620] [c013ca9b] getnstimeofday+0x4f/0xd1 [608566.964646] [c01fc1fc] strlcpy+0x11/0x3d [608566.964669] [c028dac2] dev_watchdog+0xf6/0x17c [608566.964695] [c013aeb3] sched_clock_tick+0x95/0x9e [608566.964721] [c0138da6] hrtimer_forward+0x10c/0x124 [608566.964747] [c013ca9b] getnstimeofday+0x4f/0xd1 [608566.964773] [c011025d] lapic_next_event+0x10/0x13 [608566.964798] [c028d9cc] dev_watchdog+0x0/0x17c [608566.964823] [c012e4b4] run_timer_softirq+0x14a/0x1b4 [608566.964849] [c028d9cc] dev_watchdog+0x0/0x17c [608566.964874] [c012b28f] __do_softirq+0x8c/0x130 [608566.965679] [c012b378] do_softirq+0x45/0x53 [608566.965703] [c012b480] irq_exit+0x35/0x69 [608566.965728] [c01109c5] smp_apic_timer_interrupt+0x6e/0x78 [608566.965754] [c0104620] apic_timer_interrupt+0x28/0x30 [608566.965780] [c010924a] mwait_idle+0x2f/0x3b [608566.965804] [c0102a37] cpu_idle+0x71/0x8a [608566.965827] ---[ end trace 23769fc216abaa67 ]--- [608566.982729] r8169: eth0: link up Somehow the onboard LAN hickupped. The disk subsystem is still fine. But the outcome seems similar to the problems above. The LAN came up by itself again without needing to reboot the machine. any suggestions? -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#519158: linux-image-2.6.26-1-686: ATA subsystem crashes and renders single drive unusable
On Tue, 2009-03-10 at 19:19 +0100, maximilian attems wrote: can you try 2.6.28 sid snapshot, see sid aptline - http://wiki.debian.org/DebianKernel Just installed and booted it. [m...@atom ~]$ uname -a Linux atom 2.6.28-1-686 #1 SMP Wed Mar 11 04:36:21 UTC 2009 i686 GNU/Linux Longest delay for the problem to show up was 18 days so far.. so we have to wait... -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#519158: linux-image-2.6.26-1-686: ATA subsystem crashes and renders single drive unusable
Package: linux-image-2.6.26-1-686 Version: 2.6.26-13 Severity: important I am using the following HW configuration: Mainboard: Intel D945GCLF2 Mainboard Additional SATA Controller: Silicon Image, Inc. SiI 3124 Connected directly to the board: 1 Transcend TS32GSSD25S-M SATA SSD Rev: V082 Connected to the SiI controller: 3 WDC WD5000ABPS-0 Rev: 02.0 From time to time (differs from 1 to 4 weeks) the system gets to some state where the following kernel messages are logged: Mar 8 21:19:08 atom kernel: [1953069.141042] [ cut here ] Mar 8 21:19:08 atom kernel: [1953069.141042] WARNING: at drivers/ata/libata-sff.c:1321 ata_sff_hsm_move+0x5ff/0x674 [libata]() Mar 8 21:19:08 atom kernel: [1953069.141042] Modules linked in: tun nfsd auth_rpcgss exportfs nfs lockd nfs_acl sunrpc ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables x_tables ipv6 usb_storage smsc47m192 hwmon_vid loop serio_raw rng_core button psmouse asix usbnet snd_pcm snd_timer snd iTCO_wdt i2c_i801 soundcore mii snd_page_alloc i2c_core pcspkr intel_agp agpgart joydev evdev ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod raid456 md_mod async_xor async_memcpy async_tx xor sd_mod ata_piix usbhid hid ff_memless ata_generic sata_sil24 libata scsi_mod piix dock ide_pci_generic ide_core ehci_hcd uhci_hcd usbcore r8169 thermal processor fan thermal_sys Mar 8 21:19:08 atom kernel: [1953069.141042] Pid: 0, comm: swapper Tainted: G W 2.6.26-1-686 #1 Mar 8 21:19:08 atom kernel: [1953069.141042] [c012256f] warn_on_slowpath+0x40/0x66 Mar 8 21:19:08 atom kernel: [1953069.141042] [c01318e9] autoremove_wake_function+0xd/0x2d Mar 8 21:19:08 atom kernel: [1953069.141042] [c011845d] __wake_up_common+0x2e/0x58 Mar 8 21:19:08 atom kernel: [1953069.141042] [c011a641] __wake_up+0x29/0x39 Mar 8 21:19:08 atom kernel: [1953069.141042] [f8940b46] md_wakeup_thread+0x1e/0x20 [md_mod] Mar 8 21:19:08 atom kernel: [1953069.141042] [f8974906] release_stripe+0x21/0x2e [raid456] Mar 8 21:19:08 atom kernel: [1953069.141042] [f897837a] raid5_end_write_request+0x0/0x99 [raid456] Mar 8 21:19:08 atom kernel: [1953069.141042] [f88e73d0] scsi_run_queue+0x200/0x219 [scsi_mod] Mar 8 21:19:08 atom kernel: [1953069.141042] [c01d0175] elv_queue_empty+0x1d/0x1e Mar 8 21:19:08 atom kernel: [1953069.141042] [f89111d5] ata_sff_hsm_move+0x5ff/0x674 [libata] Mar 8 21:19:08 atom kernel: [1953069.141042] [f88e7b3b] scsi_end_request+0x62/0x6b [scsi_mod] Mar 8 21:19:08 atom kernel: [1953069.141042] [f88e863b] scsi_io_completion+0x1a6/0x363 [scsi_mod] Mar 8 21:19:08 atom kernel: [1953069.141042] [f8911e88] ata_sff_interrupt+0x127/0x19f [libata] Mar 8 21:19:08 atom kernel: [1953069.141042] [c0151fd2] handle_IRQ_event+0x23/0x51 Mar 8 21:19:08 atom kernel: [1953069.141042] [c01530d1] handle_fasteoi_irq+0x71/0xa4 Mar 8 21:19:08 atom kernel: [1953069.141042] [c0105f3a] do_IRQ+0x4d/0x63 Mar 8 21:19:08 atom kernel: [1953069.141043] [c0108bbf] mwait_idle+0x0/0x3d Mar 8 21:19:08 atom kernel: [1953069.141043] [c01042a7] common_interrupt+0x23/0x28 Mar 8 21:19:08 atom kernel: [1953069.141043] [c0108bbf] mwait_idle+0x0/0x3d Mar 8 21:19:08 atom kernel: [1953069.141043] [c0108bee] mwait_idle+0x2f/0x3d Mar 8 21:19:08 atom kernel: [1953069.141043] [c01025ce] cpu_idle+0xab/0xcb Mar 8 21:19:08 atom kernel: [1953069.141043] === after that the SSD drive connected to the board itself starts to make troubles: Mar 8 21:19:08 atom kernel: [1953069.141042] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Mar 8 21:19:08 atom kernel: [1953069.141042] ata5.00: BMDMA stat 0x26 Mar 8 21:19:08 atom kernel: [1953069.141042] ata5.00: cmd ca/00:08:a7:20:14/00:00:00:00:00/e0 tag 0 dma 4096 out Mar 8 21:19:08 atom kernel: [1953069.141042] res 51/40:08:a7:20:14/00:00:00:00:00/e0 Emask 0x29 (host bu s error) Mar 8 21:19:08 atom kernel: [1953069.141042] ata5.00: status: { DRDY ERR } Mar 8 21:19:08 atom kernel: [1953069.141042] ata5.00: error: { UNC } Mar 8 21:19:38 atom kernel: [1953106.145781] ata5.00: qc timeout (cmd 0xef) Mar 8 21:19:38 atom kernel: [1953106.145818] ata5.00: failed to set xfermode (err_mask=0x4) Mar 8 21:19:38 atom kernel: [1953106.145845] ata5: failed to recover some devices, retrying in 5 secs Mar 8 21:19:48 atom kernel: [1953119.090028] ata5: link is slow to respond, please be patient (ready=0) Mar 8 21:19:53 atom kernel: [1953126.002047] ata5: device not ready (errno=-16), forcing hardreset Mar 8 21:19:53 atom kernel: [1953126.002091] ata5: soft resetting link Mar 8 21:19:58 atom kernel: [1953132.995359] ata5: link is slow to respond, please be patient (ready=0) Mar 8 21:20:03 atom kernel: [1953139.929986] ata5: SRST failed (errno=-16) Mar 8 21:20:03 atom kernel: [1953139.930027] ata5: soft resetting link Mar 8 21:20:08 atom kernel: [1953146.559721] ata5: link is slow to respond, please be patient (ready=0) Mar 8 21:20:13