dfly# uname -a DragonFly dfly.bagdala2.net 5.0-RELEASE DragonFly v5.0.0.2.ga9d62-RELEASE #10: Tue Oct 17 07:25:14 EDT 2017 [email protected]:/usr/obj/usr/src/sys/X86_64_GENERIC x86_64
dfly# mount ROOT on / (hammer, noatime, local) devfs on /dev (devfs, nosymfollow, local) /dev/serno/B620550018.s1a on /boot (ufs, local) /pfs/@@-1:00001 on /var (null, local) /pfs/@@-1:00002 on /tmp (null, local) /pfs/@@-1:00003 on /home (null, local) /pfs/@@-1:00004 on /usr/obj (null, local) /pfs/@@-1:00005 on /var/crash (null, local) /pfs/@@-1:00006 on /var/tmp (null, local) procfs on /proc (procfs, local) DATA on /data (hammer, noatime, local) BACKUP on /backup (hammer, noatime, local) /data/pfs/@@-1:00001 on /data/backups (null, local) /data/pfs/@@-1:00002 on /data/nfs (null, NFS exported, local) /dev/da3s1e@DATA on /test-hammer2 (hammer2, local) dfly# smartctl -d sat -l selftest /dev/da1 smartctl 6.5 2016-05-07 r4318 [DragonFly 5.0-RELEASE x86_64] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 90% 14053 1060611176 # 2 Short offline Completed: read failure 90% 14029 1060611176 # 3 Short offline Completed: read failure 90% 14005 1060611176 # 4 Extended offline Completed: read failure 90% 13982 1060611176 # 5 Short offline Completed: read failure 90% 13981 1060611176 # 6 Short offline Completed: read failure 90% 13957 1060611176 # 7 Short offline Completed: read failure 90% 13933 1060611176 # 8 Short offline Completed: read failure 90% 13909 1060611176 # 9 Short offline Completed: read failure 90% 13885 1060611176 #10 Short offline Completed: read failure 90% 13861 1060611176 #11 Short offline Completed: read failure 90% 13837 1060611176 #12 Extended offline Completed: read failure 90% 13814 1060611176 #13 Short offline Completed: read failure 90% 13813 1060611176 #14 Short offline Completed without error 00% 13789 - #15 Short offline Completed without error 00% 13765 - #16 Short offline Completed without error 00% 13741 - #17 Short offline Completed without error 00% 13717 - #18 Short offline Completed without error 00% 13693 - #19 Short offline Completed without error 00% 13669 - #20 Extended offline Completed without error 00% 13654 - #21 Short offline Completed without error 00% 13645 - as well as lots of ahci0.2: TFES slot 28 ci_saved = 10000000 ahci0.2: read NCQ error page slot=28 ahci0.2: DONE log page target 0 err_slot=28 ahci0.2: disk_rw: error fiscmd=0x60 @off=0x0000007e6f48c000, 32768 (da1:ahci0:2:0:0): READ(10). CDB: 28 0 3f 37 a4 60 0 0 40 0 (da1:ahci0:2:0:0): CAM Status: SCSI Status Error (da1:ahci0:2:0:0): SCSI Status: Check Condition (da1:ahci0:2:0:0): MEDIUM ERROR asc:0,0 (da1:ahci0:2:0:0): No additional sense information (da1:ahci0:2:0:0): Retrying Command (per Sense Data) ahci0.2: TFES slot 7 ci_saved = 00000080 ahci0.2: read NCQ error page slot=7 ahci0.2: DONE log page target 0 err_slot=7 ahci0.2: disk_rw: error fiscmd=0x60 @off=0x0000007e6f48c000, 32768 (da1:ahci0:2:0:0): READ(10). CDB: 28 0 3f 37 a4 60 0 0 40 0 (da1:ahci0:2:0:0): CAM Status: SCSI Status Error (da1:ahci0:2:0:0): SCSI Status: Check Condition (da1:ahci0:2:0:0): MEDIUM ERROR asc:0,0 (da1:ahci0:2:0:0): No additional sense information (da1:ahci0:2:0:0): Retrying Command (per Sense Data) ahci0.2: TFES slot 8 ci_saved = 00000100 ahci0.2: read NCQ error page slot=8 ahci0.2: DONE log page target 0 err_slot=8 ahci0.2: disk_rw: error fiscmd=0x60 @off=0x0000007e6f48c000, 32768 (da1:ahci0:2:0:0): READ(10). CDB: 28 0 3f 37 a4 60 0 0 40 0 (da1:ahci0:2:0:0): CAM Status: SCSI Status Error (da1:ahci0:2:0:0): SCSI Status: Check Condition (da1:ahci0:2:0:0): MEDIUM ERROR asc:0,0 (da1:ahci0:2:0:0): No additional sense information (da1:ahci0:2:0:0): Retries Exhausted in my dmesg What is the correct way to recover from the dying HDD? Should I stop mirroring immediately and promote the slave into the master before putting a new drive and making it slave? How can I tell if the data is corrupted on the current master? Cheers, Predrag
