Bug#801975: linux-image-3.16.0-0.bpo.4-amd64: pm80xx report failing drive but mdadm doesn't set this drive failing

2017-08-13 Thread Ben Hutchings
Control: reassign -1 src:linux 3.16.7-ckt11-1
Control: tag -1 moreinfo

I'm sorry you didn't receive a response to this earlier.  The bug
tracking system didn't properly handle binary packages that only exist
in backports suites, so your report was not sent to the linux
maintainers.  It appears that that has been fixed, as your report now
shows up in a list of bugs for the linux source package.

Were you ever able to determine how to trigger this data loss?  Did you
find a fix?

Ben.

-- 
Ben Hutchings
If you seem to know what you are doing, you'll be given more to do.



signature.asc
Description: This is a digitally signed message part


Bug#801975: linux-image-3.16.0-0.bpo.4-amd64: pm80xx report failing drive but mdadm doesn't set this drive failing

2015-10-16 Thread QUOST Xavier
Package: linux-image-3.16.0-0.bpo.4-amd64
Version: 3.16.7-ckt11-1~bpo70+1
Justification: causes serious data loss
Severity: critical
Subject: linux-image-3.16.0-0.bpo.4-amd64: pm80xx report failing drive but 
mdadm doesn't set this drive failing

hello,

We enconter a serious bug related with a 6805H adapect controler on wich
8 drives were plugged.
Those drives were configured as raid6 with LVM.

The october the 4th pm80xx modules repport a failling disk but the disk
was not repported by mdadm as faulty, and the data became corrupted.

We are trying to reproduce the bug on non cruitical date but with no
succes for the moment. However, in my mind the data loss was big enough
for repporting the bug whatever.


Note : 
(1) we have remounted the machine for data retrieval, adding an ASUS
controller card (so no relation to the signaled bug).
(2) may be related to bug 774583 as problems appear after a checkarray.

thanks

best regards
Xavier Quost



-- Package-specific info:
** Version:
Linux version 3.16.0-0.bpo.4-amd64 (debian-kernel@lists.debian.org) (gcc
version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.16.7-ckt11-1~bpo70+1
(2015-06-08)

Package: mdadm
Version: 3.2.5-5

** Command line:
BOOT_IMAGE=/vmlinuz-3.16.0-0.bpo.4-amd64
root=UUID=23b97aa0-53d0-42c4-afd0-48f302de1b08 ro quiet
processor.max_cstate=1 idle=poll nox2apic intermap=off

** Tainted: WO (4608)
 * Taint on warning.
 * Out-of-tree module has been loaded.

** Kernel log:
Oct  4 00:57:01 nassli kernel: [3944417.125573] md: data-check of RAID
array md0
Oct  4 00:57:01 nassli kernel: [3944417.125576] md: minimum _guaranteed_
speed: 5 KB/sec/disk.
Oct  4 00:57:01 nassli kernel: [3944417.125577] md: using maximum
available idle IO bandwidth (but not more than 500 KB/sec) for
data-check.
Oct  4 00:57:01 nassli kernel: [3944417.125582] md: using 128k window,
over a total of 3907016192k.
Oct  4 08:41:31 nassli kernel: [3972277.588119] pm80xx mpi_sata_event
2689:SATA EVENT 0x23
Oct  4 08:41:31 nassli kernel: [3972277.588125] pm80xx
mpi_sata_completion 2373:SAS Address of IO Failure
Drive:5d1106ee7590
Oct  4 08:41:31 nassli kernel: [3972277.588315] pm80xx
mpi_sata_completion 2373:SAS Address of IO Failure
Drive:5d1106ee7590pm80xx mpi_sata_completion 2373:SAS Address of IO
Failure Drive:5d1106ee7590
Oct  4 08:41:31 nassli kernel: [3972277.588634] pm80xx
mpi_sata_completion 2373:SAS Address of IO Failure
Drive:5d1106ee7590pm80xx mpi_sata_completion 2373:SAS Address of IO
Failure Drive:5d1106ee7590
Oct  4 08:41:31 nassli kernel: [3972277.588945] pm80xx
mpi_sata_completion 2373:SAS Address of IO Failure
Drive:5d1106ee7590pm80xx mpi_sata_completion 2373:SAS Address of IO
Failure Drive:5d1106ee7590
Oct  4 08:41:31 nassli kernel: [3972277.589256] pm80xx
mpi_sata_completion 2373:SAS Address of IO Failure
Drive:5d1106ee7590pm80xx mpi_sata_completion 2373:SAS Address of IO
Failure Drive:5d1106ee7590
Oct  4 08:41:31 nassli kernel: [3972277.589563] pm80xx
mpi_sata_completion 2373:SAS Address of IO Failure
Drive:5d1106ee7590pm80xx mpi_sata_completion 2373:SAS Address of IO
Failure Drive:5d1106ee7590
Oct  4 08:41:31 nassli kernel: [3972277.589875] pm80xx
mpi_sata_completion 2373:SAS Address of IO Failure
Drive:5d1106ee7590pm80xx mpi_sata_completion 2373:SAS Address of IO
Failure Drive:5d1106ee7590
Oct  4 08:41:31 nassli kernel: [3972277.590186] pm80xx
mpi_sata_completion 2373:SAS Address of IO Failure
Drive:5d1106ee7590pm80xx mpi_sata_completion 2373:SAS Address of IO
Failure Drive:5d1106ee7590
Oct  4 08:41:57 nassli kernel: [3972277.590494] pm80xx
mpi_sata_completion 2373:SAS Address of IO Failure
Drive:5d1106ee7590pm80xx mpi_sata_completion 2373:SAS Address of IO
Failure Drive:5d1106ee7590
Oct  4 08:41:57 nassli kernel: [3972304.318093] sas: Enter
sas_scsi_recover_host busy: 30 failed: 30
Oct  4 08:41:57 nassli kernel: [3972304.318099] sas: trying to find task
0x8802b80d6440
Oct  4 08:41:57 nassli kernel: [3972304.318100] sas: sas_scsi_find_task:
aborting task 0x8802b80d6440
Oct  4 08:41:57 nassli kernel: [3972304.318254] pm80xx
mpi_sata_completion 2373:SAS Address of IO Failure
Drive:5d1106ee7590
Oct  4 08:41:57 nassli kernel: [3972304.318260] sas: sas_scsi_find_task:
task 0x8802b80d6440 is done
Oct  4 08:41:57 nassli kernel: [3972304.318261] sas:
sas_eh_handle_sas_errors: task 0x8802b80d6440 is done
Oct  4 08:41:57 nassli kernel: [3972304.318262] sas: trying to find task
0x8803d7ba6c00
Oct  4 08:41:57 nassli kernel: [3972304.318263] sas: sas_scsi_find_task:
aborting task 0x8803d7ba6c00
Oct  4 08:41:57 nassli kernel: [3972304.318402] pm80xx
mpi_sata_completion 2373:SAS Address of IO Failure
Drive:5d1106ee7590
Oct  4 08:41:57 nassli kernel: [3972304.318404] sas: sas_scsi_find_task:
task 0x8803d7ba6c00 is done
Oct  4 08:41:57 nassli kernel: [3972304.318405] sas:
sas_eh_handle_sas_errors: task 0x8803d7ba6c00 is done
Oct  4 08:41:57 nassli kernel: