Re: A sad raid/fsck story

2019-11-15 Thread sven falempin
On Sat, Oct 5, 2019 at 8:39 AM Nick Holland  wrote:
>
> On 10/4/19 8:37 AM, sven falempin wrote:
> ...
> > How [do I] check the state of the MIRROR raid array , to detect large
> > amount of failures on one of the two disk ?
> >
> > Best.
> >
>
> fsck has NOTHING to do with the status of your drives.
> It's a File System ChecKer.  Your disk can be covered with unreadable
> sectors but if the file system on that disk is intact, fsck reports
> no problem.  Conversely, your disks can be fine, but your file system
> can be scrambled beyond recognition; bad news from fsck doesn't mean
> your drive is bad.
>
> To check the status of the disks, you probably want to slip a call
> to bioctl into /etc/daily.local:
>
> # bioctl softraid0
> Volume  Status   Size Device
> softraid0 0 Online  7945693712896 sd2 RAID1
>   0 Online  7945693712896 0:0.0   noencl 
>   1 Online  7945693712896 0:1.0   noencl 
>
> This is a happy array.  If you have a bad drive, one of those
> physical drives is going to not be online.
>
> Nick.
>

My moral of the story is:

if your raid array is not mounting, check smart, check bioctl, FSCK
each disk separately
and then restore or dump the bad drive

Next,

Raid 5 is cool . It knows which disk failed the checksum ?



Re: A sad raid/fsck story

2019-10-05 Thread Nick Holland
On 10/4/19 8:37 AM, sven falempin wrote:
...
> How [do I] check the state of the MIRROR raid array , to detect large
> amount of failures on one of the two disk ?
> 
> Best.
> 

fsck has NOTHING to do with the status of your drives.
It's a File System ChecKer.  Your disk can be covered with unreadable
sectors but if the file system on that disk is intact, fsck reports
no problem.  Conversely, your disks can be fine, but your file system
can be scrambled beyond recognition; bad news from fsck doesn't mean
your drive is bad.

To check the status of the disks, you probably want to slip a call
to bioctl into /etc/daily.local:

# bioctl softraid0
Volume  Status   Size Device  
softraid0 0 Online  7945693712896 sd2 RAID1 
  0 Online  7945693712896 0:0.0   noencl 
  1 Online  7945693712896 0:1.0   noencl 

This is a happy array.  If you have a bad drive, one of those 
physical drives is going to not be online.

Nick.



Re: A sad raid/fsck story

2019-10-05 Thread Gwen Nelson
RAID is not a backup solution and should not be treated as one

On Fri, Oct 4, 2019, 3:41 PM sven falempin  wrote:

> On Fri, Oct 4, 2019 at 8:10 AM Nick Holland 
> wrote:
> >
> > On 10/3/19 10:01 AM, sven falempin wrote:
> > > Dear readers,
> > >
> > > I was running a OpenBSD (6.4) device, with a raid mirror array.
> > > One of the disk failed, so the system ask me to fsck,
> >
> > Probably not quite that simple.  More likely, the disk failed,
> > that took the system down hard, and it needed an fsck on reboot.
> > Which is normal, RAID or otherwise.
> >
> > > which I did before checking the raid status manually ( :'( ) ,
> > > THEN I rebooted and softraid told me: one of the hard drive is dead.
> > >
> > > But fsck already destroyed a few file on the mirror.
> >
> > that seems unlikely.  that's not what fsck does -- fsck's job is to
> > repair a file system.  If it removes a file, the file is already
> > damaged.
> >
> > > Probably a user error, nevertheless, In openbsd 'simply work' mindset,
> > > maybe the /etc/rc could warn or even perform some bioctl check on raid
> > > array when first fsck / mount
> > > fails.
> >
> > I'm not seeing what this has to do with RAID, soft or otherwise.  If your
> > system needed an fsck, it needed it whether it was a simple drive or a
> > RAID array.  If you need an fsck, you are likely to have lost data.
> >
> > > ( Lost data recovered from backup )
> >
> > And again...nothing to do with either fsck or RAID -- you have to have
> > a backup.  RAID doesn't change that.
> >
> > Nick.
> >
>
>
> Let me reformulate as a question, because I clearly misslead you in
> thinking that fsck -p from rc would delete files or having a backup
> is a bad idea. @_@
> I lose recent data with fsck -y , and use it because i have a backup,
> the data loss here was massive (old untouched files).
>
> How to check the state of the MIRROR raid array , to detect large
> amount of failures on one of the two disk ?
>
> Best.
>
> --
> --
>
> -
> Knowing is not enough; we must apply. Willing is not enough; we must do
>
>


Re: A sad raid/fsck story

2019-10-04 Thread sven falempin
On Fri, Oct 4, 2019 at 8:10 AM Nick Holland  wrote:
>
> On 10/3/19 10:01 AM, sven falempin wrote:
> > Dear readers,
> >
> > I was running a OpenBSD (6.4) device, with a raid mirror array.
> > One of the disk failed, so the system ask me to fsck,
>
> Probably not quite that simple.  More likely, the disk failed,
> that took the system down hard, and it needed an fsck on reboot.
> Which is normal, RAID or otherwise.
>
> > which I did before checking the raid status manually ( :'( ) ,
> > THEN I rebooted and softraid told me: one of the hard drive is dead.
> >
> > But fsck already destroyed a few file on the mirror.
>
> that seems unlikely.  that's not what fsck does -- fsck's job is to
> repair a file system.  If it removes a file, the file is already
> damaged.
>
> > Probably a user error, nevertheless, In openbsd 'simply work' mindset,
> > maybe the /etc/rc could warn or even perform some bioctl check on raid
> > array when first fsck / mount
> > fails.
>
> I'm not seeing what this has to do with RAID, soft or otherwise.  If your
> system needed an fsck, it needed it whether it was a simple drive or a
> RAID array.  If you need an fsck, you are likely to have lost data.
>
> > ( Lost data recovered from backup )
>
> And again...nothing to do with either fsck or RAID -- you have to have
> a backup.  RAID doesn't change that.
>
> Nick.
>


Let me reformulate as a question, because I clearly misslead you in
thinking that fsck -p from rc would delete files or having a backup
is a bad idea. @_@
I lose recent data with fsck -y , and use it because i have a backup,
the data loss here was massive (old untouched files).

How to check the state of the MIRROR raid array , to detect large
amount of failures on one of the two disk ?

Best.

-- 
--
-
Knowing is not enough; we must apply. Willing is not enough; we must do



Re: A sad raid/fsck story

2019-10-04 Thread Nick Holland
On 10/3/19 10:01 AM, sven falempin wrote:
> Dear readers,
> 
> I was running a OpenBSD (6.4) device, with a raid mirror array.
> One of the disk failed, so the system ask me to fsck,

Probably not quite that simple.  More likely, the disk failed,
that took the system down hard, and it needed an fsck on reboot.
Which is normal, RAID or otherwise. 

> which I did before checking the raid status manually ( :'( ) ,
> THEN I rebooted and softraid told me: one of the hard drive is dead.
> 
> But fsck already destroyed a few file on the mirror.

that seems unlikely.  that's not what fsck does -- fsck's job is to
repair a file system.  If it removes a file, the file is already
damaged.

> Probably a user error, nevertheless, In openbsd 'simply work' mindset,
> maybe the /etc/rc could warn or even perform some bioctl check on raid
> array when first fsck / mount
> fails.

I'm not seeing what this has to do with RAID, soft or otherwise.  If your
system needed an fsck, it needed it whether it was a simple drive or a
RAID array.  If you need an fsck, you are likely to have lost data.

> ( Lost data recovered from backup )

And again...nothing to do with either fsck or RAID -- you have to have
a backup.  RAID doesn't change that.

Nick.



Re: A sad raid/fsck story

2019-10-03 Thread Aaron Mason
Thanks for the cautionary tale.  Will definitely keep this in mind for
any RAID arrays I manage.

On Fri, Oct 4, 2019 at 2:04 AM sven falempin  wrote:
>
> Dear readers,
>
> I was running a OpenBSD (6.4) device, with a raid mirror array.
> One of the disk failed, so the system ask me to fsck,
> which I did before checking the raid status manually ( :'( ) ,
> THEN I rebooted and softraid told me: one of the hard drive is dead.
>
> But fsck already destroyed a few file on the mirror.
>
> Probably a user error, nevertheless, In openbsd 'simply work' mindset,
> maybe the /etc/rc could warn or even perform some bioctl check on raid
> array when first fsck / mount
> fails.
>
> Cheers.
>
> ( Lost data recovered from backup )



-- 
Aaron Mason - Programmer, open source addict
I've taken my software vows - for beta or for worse



A sad raid/fsck story

2019-10-03 Thread sven falempin
Dear readers,

I was running a OpenBSD (6.4) device, with a raid mirror array.
One of the disk failed, so the system ask me to fsck,
which I did before checking the raid status manually ( :'( ) ,
THEN I rebooted and softraid told me: one of the hard drive is dead.

But fsck already destroyed a few file on the mirror.

Probably a user error, nevertheless, In openbsd 'simply work' mindset,
maybe the /etc/rc could warn or even perform some bioctl check on raid
array when first fsck / mount
fails.

Cheers.

( Lost data recovered from backup )