Re: A sad raid/fsck story
On Sat, Oct 5, 2019 at 8:39 AM Nick Holland wrote: > > On 10/4/19 8:37 AM, sven falempin wrote: > ... > > How [do I] check the state of the MIRROR raid array , to detect large > > amount of failures on one of the two disk ? > > > > Best. > > > > fsck has NOTHING to do with the status of your drives. > It's a File System ChecKer. Your disk can be covered with unreadable > sectors but if the file system on that disk is intact, fsck reports > no problem. Conversely, your disks can be fine, but your file system > can be scrambled beyond recognition; bad news from fsck doesn't mean > your drive is bad. > > To check the status of the disks, you probably want to slip a call > to bioctl into /etc/daily.local: > > # bioctl softraid0 > Volume Status Size Device > softraid0 0 Online 7945693712896 sd2 RAID1 > 0 Online 7945693712896 0:0.0 noencl > 1 Online 7945693712896 0:1.0 noencl > > This is a happy array. If you have a bad drive, one of those > physical drives is going to not be online. > > Nick. > My moral of the story is: if your raid array is not mounting, check smart, check bioctl, FSCK each disk separately and then restore or dump the bad drive Next, Raid 5 is cool . It knows which disk failed the checksum ?
Re: A sad raid/fsck story
On 10/4/19 8:37 AM, sven falempin wrote: ... > How [do I] check the state of the MIRROR raid array , to detect large > amount of failures on one of the two disk ? > > Best. > fsck has NOTHING to do with the status of your drives. It's a File System ChecKer. Your disk can be covered with unreadable sectors but if the file system on that disk is intact, fsck reports no problem. Conversely, your disks can be fine, but your file system can be scrambled beyond recognition; bad news from fsck doesn't mean your drive is bad. To check the status of the disks, you probably want to slip a call to bioctl into /etc/daily.local: # bioctl softraid0 Volume Status Size Device softraid0 0 Online 7945693712896 sd2 RAID1 0 Online 7945693712896 0:0.0 noencl 1 Online 7945693712896 0:1.0 noencl This is a happy array. If you have a bad drive, one of those physical drives is going to not be online. Nick.
Re: A sad raid/fsck story
RAID is not a backup solution and should not be treated as one On Fri, Oct 4, 2019, 3:41 PM sven falempin wrote: > On Fri, Oct 4, 2019 at 8:10 AM Nick Holland > wrote: > > > > On 10/3/19 10:01 AM, sven falempin wrote: > > > Dear readers, > > > > > > I was running a OpenBSD (6.4) device, with a raid mirror array. > > > One of the disk failed, so the system ask me to fsck, > > > > Probably not quite that simple. More likely, the disk failed, > > that took the system down hard, and it needed an fsck on reboot. > > Which is normal, RAID or otherwise. > > > > > which I did before checking the raid status manually ( :'( ) , > > > THEN I rebooted and softraid told me: one of the hard drive is dead. > > > > > > But fsck already destroyed a few file on the mirror. > > > > that seems unlikely. that's not what fsck does -- fsck's job is to > > repair a file system. If it removes a file, the file is already > > damaged. > > > > > Probably a user error, nevertheless, In openbsd 'simply work' mindset, > > > maybe the /etc/rc could warn or even perform some bioctl check on raid > > > array when first fsck / mount > > > fails. > > > > I'm not seeing what this has to do with RAID, soft or otherwise. If your > > system needed an fsck, it needed it whether it was a simple drive or a > > RAID array. If you need an fsck, you are likely to have lost data. > > > > > ( Lost data recovered from backup ) > > > > And again...nothing to do with either fsck or RAID -- you have to have > > a backup. RAID doesn't change that. > > > > Nick. > > > > > Let me reformulate as a question, because I clearly misslead you in > thinking that fsck -p from rc would delete files or having a backup > is a bad idea. @_@ > I lose recent data with fsck -y , and use it because i have a backup, > the data loss here was massive (old untouched files). > > How to check the state of the MIRROR raid array , to detect large > amount of failures on one of the two disk ? > > Best. > > -- > -- > > - > Knowing is not enough; we must apply. Willing is not enough; we must do > >
Re: A sad raid/fsck story
On Fri, Oct 4, 2019 at 8:10 AM Nick Holland wrote: > > On 10/3/19 10:01 AM, sven falempin wrote: > > Dear readers, > > > > I was running a OpenBSD (6.4) device, with a raid mirror array. > > One of the disk failed, so the system ask me to fsck, > > Probably not quite that simple. More likely, the disk failed, > that took the system down hard, and it needed an fsck on reboot. > Which is normal, RAID or otherwise. > > > which I did before checking the raid status manually ( :'( ) , > > THEN I rebooted and softraid told me: one of the hard drive is dead. > > > > But fsck already destroyed a few file on the mirror. > > that seems unlikely. that's not what fsck does -- fsck's job is to > repair a file system. If it removes a file, the file is already > damaged. > > > Probably a user error, nevertheless, In openbsd 'simply work' mindset, > > maybe the /etc/rc could warn or even perform some bioctl check on raid > > array when first fsck / mount > > fails. > > I'm not seeing what this has to do with RAID, soft or otherwise. If your > system needed an fsck, it needed it whether it was a simple drive or a > RAID array. If you need an fsck, you are likely to have lost data. > > > ( Lost data recovered from backup ) > > And again...nothing to do with either fsck or RAID -- you have to have > a backup. RAID doesn't change that. > > Nick. > Let me reformulate as a question, because I clearly misslead you in thinking that fsck -p from rc would delete files or having a backup is a bad idea. @_@ I lose recent data with fsck -y , and use it because i have a backup, the data loss here was massive (old untouched files). How to check the state of the MIRROR raid array , to detect large amount of failures on one of the two disk ? Best. -- -- - Knowing is not enough; we must apply. Willing is not enough; we must do
Re: A sad raid/fsck story
On 10/3/19 10:01 AM, sven falempin wrote: > Dear readers, > > I was running a OpenBSD (6.4) device, with a raid mirror array. > One of the disk failed, so the system ask me to fsck, Probably not quite that simple. More likely, the disk failed, that took the system down hard, and it needed an fsck on reboot. Which is normal, RAID or otherwise. > which I did before checking the raid status manually ( :'( ) , > THEN I rebooted and softraid told me: one of the hard drive is dead. > > But fsck already destroyed a few file on the mirror. that seems unlikely. that's not what fsck does -- fsck's job is to repair a file system. If it removes a file, the file is already damaged. > Probably a user error, nevertheless, In openbsd 'simply work' mindset, > maybe the /etc/rc could warn or even perform some bioctl check on raid > array when first fsck / mount > fails. I'm not seeing what this has to do with RAID, soft or otherwise. If your system needed an fsck, it needed it whether it was a simple drive or a RAID array. If you need an fsck, you are likely to have lost data. > ( Lost data recovered from backup ) And again...nothing to do with either fsck or RAID -- you have to have a backup. RAID doesn't change that. Nick.
Re: A sad raid/fsck story
Thanks for the cautionary tale. Will definitely keep this in mind for any RAID arrays I manage. On Fri, Oct 4, 2019 at 2:04 AM sven falempin wrote: > > Dear readers, > > I was running a OpenBSD (6.4) device, with a raid mirror array. > One of the disk failed, so the system ask me to fsck, > which I did before checking the raid status manually ( :'( ) , > THEN I rebooted and softraid told me: one of the hard drive is dead. > > But fsck already destroyed a few file on the mirror. > > Probably a user error, nevertheless, In openbsd 'simply work' mindset, > maybe the /etc/rc could warn or even perform some bioctl check on raid > array when first fsck / mount > fails. > > Cheers. > > ( Lost data recovered from backup ) -- Aaron Mason - Programmer, open source addict I've taken my software vows - for beta or for worse
A sad raid/fsck story
Dear readers, I was running a OpenBSD (6.4) device, with a raid mirror array. One of the disk failed, so the system ask me to fsck, which I did before checking the raid status manually ( :'( ) , THEN I rebooted and softraid told me: one of the hard drive is dead. But fsck already destroyed a few file on the mirror. Probably a user error, nevertheless, In openbsd 'simply work' mindset, maybe the /etc/rc could warn or even perform some bioctl check on raid array when first fsck / mount fails. Cheers. ( Lost data recovered from backup )