Re: [CentOS] Intel RST RAID 1, partition tables and UUIDs

2020-11-16 Thread hw
On Mon, 2020-11-16 at 18:06 -0500, H wrote:
> On 11/16/2020 01:23 PM, Jonathan Billings wrote:
> > On Sun, Nov 15, 2020 at 07:49:09PM -0500, H wrote:
> > > I have been having some problems with hardware RAID 1 on the
> > > motherboard that I am running CentOS 7 on. After a BIOS upgrade of
> > > the system, I lost the RAID 1 setup and was no longer able to boot
> > > the system. 
> > The Intel RST RAID (aka Intel Matrix RAID) is also known as a
> > fakeraid.  It isn't a hardware RAID, but instead a software RAID that
> > has a fancy BIOS interface.  I believe that the mdadm tool can examine
> > the RAID settings, and you can look at /proc/mdstat to see its status,
> > although from what I remember from previous posts, it's better to just
> > let the BIOS think it's a JBOD and use the linux software RAID tools
> > directly. 
> > 
> I see, thank you. Right now I am running off one of the disks because of the 
> mishap, I am also waiting for a systemboard replacement at which time I can 
> decided whether to go with Linux software RAID, ie mdadm, or back to the 
> Intel BIOS RAID.
> 
> The latter lacks any progress indicators in BIOS when rebuilding an array 
> which took around 20 hours for a 256 GB RAID 1 setup and it is annoying not 
> to know the status of the rebuild etc. Could mdadm in a command window helped 
> me answer that question?
> 
> Also, it seemed that the BIOS RAID damaged the partition table on the disks - 
> should I expect that this happens? My guess would be no but what do I know...

I'd use software raid rather the fakeraid.  One of the advantages is that
you are not limited to the mainboard and can use the disks in another machine
if you need to.  If you need to replace the board, you are not limited to
one that provides a compatible fakeraid.

Using software raid with mdadm will give you indication about the progress
of rebuilds and checks by looking at /proc/mdstat, and you can automatically
get an email in case a disk fails if so configured.  Being informed about
disk failures is relevant.

I've used Linux software raid for at least 20 years and never had any problems
with it other than questionable performance disadvantages compared to hardware
raid.  When a disk fails just replace it, and I've recently found that it
can be impossible to get a rebuild started with hardware raid, which makes it
virtually useless.

I've never used the (Intel) fakeraid.  Why would I?

If you don't require Centos, you could go for Fedora instead.  Fedora has btrfs
as default file system now which has software raid built-in, and Fedora can have
advantages over Centos.


___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


[CentOS] Best practice preparing for disk restoring system

2020-11-16 Thread H
Short of backing up entire disks using dd, I'd like to collect all required 
information to make sure I can restore partitions, disk information, UUIDs and 
anything else required in the event of losing a disk.

So far I am collecting information from:
- fdisk -l
- blkid
- lsblk
- grub2-efi.cfg
- grub
- fstab

Hoping that this would supply me with /all/ information to restore a system - 
with the exception of installed operating system, apps and data.

I would appreciate any and all thoughts on the above!
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Intel RST RAID 1, partition tables and UUIDs

2020-11-16 Thread H
On 11/16/2020 03:36 PM, John Pierce wrote:
> the main advantage I know of for bios fake-raid is that the bios can boot
> off either of the two mirrored boot devices.usually if the sata0 device
> has failed, the BIOS isn't smart enough to boot from sata1
>
> the only other reason is if you're running MS Windows desktop which can't
> do mirroring on its own
>
> On Mon, Nov 16, 2020 at 10:23 AM Jonathan Billings 
> wrote:
>
>> On Sun, Nov 15, 2020 at 07:49:09PM -0500, H wrote:
>>> I have been having some problems with hardware RAID 1 on the
>>> motherboard that I am running CentOS 7 on. After a BIOS upgrade of
>>> the system, I lost the RAID 1 setup and was no longer able to boot
>>> the system.
>> The Intel RST RAID (aka Intel Matrix RAID) is also known as a
>> fakeraid.  It isn't a hardware RAID, but instead a software RAID that
>> has a fancy BIOS interface.  I believe that the mdadm tool can examine
>> the RAID settings, and you can look at /proc/mdstat to see its status,
>> although from what I remember from previous posts, it's better to just
>> let the BIOS think it's a JBOD and use the linux software RAID tools
>> directly.
>>
>> --
>> Jonathan Billings 
>> ___
>> CentOS mailing list
>> CentOS@centos.org
>> https://lists.centos.org/mailman/listinfo/centos
>>
>
Thank you. As I mentioned, I am running from one disk but and the two disks 
have identical disk UUIDs, identical partition UUIDs, both of which I assume is 
an effect of the BIOS fake RAID.

If I were to go with Linux mdadm and a RAID 1 configuration, am I correct in 
assuming that I would:

- decide which one is the "master" disk

- configure mdadm to sync to the other disk

Would I need to change disk UUIDs, partition UUIDs on the second disk prior to 
this or mdadm would synchronize as needed?

Thanks.

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Intel RST RAID 1, partition tables and UUIDs

2020-11-16 Thread H
On 11/16/2020 01:23 PM, Jonathan Billings wrote:
> On Sun, Nov 15, 2020 at 07:49:09PM -0500, H wrote:
>> I have been having some problems with hardware RAID 1 on the
>> motherboard that I am running CentOS 7 on. After a BIOS upgrade of
>> the system, I lost the RAID 1 setup and was no longer able to boot
>> the system. 
> The Intel RST RAID (aka Intel Matrix RAID) is also known as a
> fakeraid.  It isn't a hardware RAID, but instead a software RAID that
> has a fancy BIOS interface.  I believe that the mdadm tool can examine
> the RAID settings, and you can look at /proc/mdstat to see its status,
> although from what I remember from previous posts, it's better to just
> let the BIOS think it's a JBOD and use the linux software RAID tools
> directly. 
>
I see, thank you. Right now I am running off one of the disks because of the 
mishap, I am also waiting for a systemboard replacement at which time I can 
decided whether to go with Linux software RAID, ie mdadm, or back to the Intel 
BIOS RAID.

The latter lacks any progress indicators in BIOS when rebuilding an array which 
took around 20 hours for a 256 GB RAID 1 setup and it is annoying not to know 
the status of the rebuild etc. Could mdadm in a command window helped me answer 
that question?

Also, it seemed that the BIOS RAID damaged the partition table on the disks - 
should I expect that this happens? My guess would be no but what do I know...

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] (C8) root on mdraid

2020-11-16 Thread Gordon Messmer

On 11/15/20 10:40 PM, Łukasz Posadowski wrote:


Sun, 15 Nov 2020 14:16:48 -0800 Gordon Messmer :



Use metadata version 1.2 instead of 0.9.


Thanks, I'll try that. I'm use to metadata 0.9, because GRUB have
(had?) some issue with the newer ones.



If that doesn't work, and you need to use metadata 0.9, then check 
/etc/default/grub and make sure that GRUB_CMDLINE_LINUX contains 
"rd.md.uuid=".


___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Intel RST RAID 1, partition tables and UUIDs

2020-11-16 Thread John Pierce
the main advantage I know of for bios fake-raid is that the bios can boot
off either of the two mirrored boot devices.usually if the sata0 device
has failed, the BIOS isn't smart enough to boot from sata1

the only other reason is if you're running MS Windows desktop which can't
do mirroring on its own

On Mon, Nov 16, 2020 at 10:23 AM Jonathan Billings 
wrote:

> On Sun, Nov 15, 2020 at 07:49:09PM -0500, H wrote:
> >
> > I have been having some problems with hardware RAID 1 on the
> > motherboard that I am running CentOS 7 on. After a BIOS upgrade of
> > the system, I lost the RAID 1 setup and was no longer able to boot
> > the system.
>
> The Intel RST RAID (aka Intel Matrix RAID) is also known as a
> fakeraid.  It isn't a hardware RAID, but instead a software RAID that
> has a fancy BIOS interface.  I believe that the mdadm tool can examine
> the RAID settings, and you can look at /proc/mdstat to see its status,
> although from what I remember from previous posts, it's better to just
> let the BIOS think it's a JBOD and use the linux software RAID tools
> directly.
>
> --
> Jonathan Billings 
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
>


-- 
-john r pierce
  recycling used bits in santa cruz
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Intel RST RAID 1, partition tables and UUIDs

2020-11-16 Thread Jonathan Billings
On Sun, Nov 15, 2020 at 07:49:09PM -0500, H wrote:
>
> I have been having some problems with hardware RAID 1 on the
> motherboard that I am running CentOS 7 on. After a BIOS upgrade of
> the system, I lost the RAID 1 setup and was no longer able to boot
> the system. 

The Intel RST RAID (aka Intel Matrix RAID) is also known as a
fakeraid.  It isn't a hardware RAID, but instead a software RAID that
has a fancy BIOS interface.  I believe that the mdadm tool can examine
the RAID settings, and you can look at /proc/mdstat to see its status,
although from what I remember from previous posts, it's better to just
let the BIOS think it's a JBOD and use the linux software RAID tools
directly. 

-- 
Jonathan Billings 
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Apologies - possible hardware problem?

2020-11-16 Thread Jeffrey Layton
On Mon, Nov 16, 2020 at 4:49 PM  wrote:

> El 16/11/20 a las 15:43, Jeffrey Layton escribió:
> > Thanks everyone for the help! I'm still struggling to get it working. I
> > think I will have to go back and start simple: (1) one DIMM, (2) New PS,
> > (3) maybe new MB (I can't ever access the BIOS any more).
> >
> > Jeff
> >
> If you are going to discard all that, perhaps can try reseting the CMOS
> before anything. Locate the jumper and close it. BEWARE! Power disconnected
> from mains, or you can see your PCB tracks burning!
>

Good idea - I will try that as well.


> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
>
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Apologies - possible hardware problem?

2020-11-16 Thread jtj

El 16/11/20 a las 15:43, Jeffrey Layton escribió:

Thanks everyone for the help! I'm still struggling to get it working. I
think I will have to go back and start simple: (1) one DIMM, (2) New PS,
(3) maybe new MB (I can't ever access the BIOS any more).

Jeff


If you are going to discard all that, perhaps can try reseting the CMOS before 
anything. Locate the jumper and close it. BEWARE! Power disconnected from 
mains, or you can see your PCB tracks burning!

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] mdadm raid-check

2020-11-16 Thread Valeri Galtsev


> On Nov 16, 2020, at 2:48 AM, hw  wrote:
> 
> On Sat, 2020-11-14 at 21:55 -0600, Valeri Galtsev wrote:
>>> On Nov 14, 2020, at 8:20 PM, hw  wrote:
>>> 
>>> 
>>> Hi,
>>> 
>>> is it required to run /usr/sbin/raid-check once per week?  Centos 7 does
>>> this.  Maybe it's sufficient to run it monthly?  IIRC Debian did it monthly.
>> 
>> On hardware RAIDs I do RAID verification once a week. Once a Month a
>> not often enough in my book. That RAID verification effectively
>> reads all stripes of all drives (and verifies that content of
>> redundant drives is consistent), thus preventing a “time bomb”, when
>> a drive left alone for too long, ready to fail in an area which is
>> not accessed, and failing when at some point different drive was
>> replaced and RAID rebuild has to go over all stripes of all
>> drives. Such “multiple failures” are due to poor sysadmin’s work:
>> not often enough RAID verification.
> 
> You mean there can be failures which can be detected during a
> raid-check and can still be repaired using the other disk, but they
> can be impossible to repair when a disk has failed?

No, what I meant to say is: the errors could have been detected, and the drive 
would be kicked out of RAID (not errors repaired), and replaced with good drive 
long ago. But if RAID is not being checked often, there is potential that more 
than redundancy number of drives are failed (in different areas) and are 
waiting to be kicked out, and when it happens the failure becomes fatal.

>> If software raid-check does the same, then it makes a lot of sense,
>> and I am more with RedHat's weekly cron job, than with Debian’s
>> Monthly.
> 
> How often do partial failures occur during normal operation?

I do not know what you mean by “partial failures”. I can imagine:

1. checksum does not match, no reason to suspect any of drives which wrong 
information comes from. If it is RAID-6, in assumption that only one drive 
provided wrong information, wrong drive can be pinpointed, and stripe on it 
overwritten, the event is over without data messed up. If it is RAID-5, there 
is no way to pinpoint wrong drive, if your setting in RAID firmware (I am 
speaking only about hardware RAIDs here) is to overwrite “parity”, fair chance 
is stripe on drive that gave correct information is overwritten, and the 
content on RAID device is damaged.

2. checksum does not match and one of the drives responded with significant 
delay. If there is no other way to pinpoint which drive wrong information came 
from, drive with delay can be fair suspect to be the one (it had to take time 
to multiple times read “bad block” and maybe re-allocate it). With fair 
certainty (but not 100%), RAID will handle the situation without data 
corruption.

3. One of the drives timed out or reported I/O error. The drive will be kicked 
out of RAID, and it is on operator’s side to decide whether to replace it or to 
attempt to rebuild RAID onto the same drive.

>  In case
> there was a power failure, it's probably a good idea to do a check
> anyway.

If you care about data on your RAID, you will use battery backup unit, which 
will keep the content of volatile RAM cache without loosing it, so when power 
has returned, the cache can be flashed to the drives. (Without cache hardware 
RAID devices are noticeably slower than with cache enabled). [non-volatile 
caches and supercapacitors are used as well]

However, the drives themselves have volatile memory as cache, that will 
evaporate when power suddenly disappears. To make things worse: drives are 
designed to lie about “transaction complete” (thus manufacturers can declare 
better specs than those of competitors), and “transaction complete” is reported 
when data is still in drives volatile cache, not on the platters. As far as I 
know, there is no way to query drive to get honest answer whether data is 
already on platters or not. Therefore, hardware RAID cards may think some of 
transactions are completed, but they may never become completed in case of 
power loss.

So, when power suddenly goes… it potentially is a mess on I/O intense box. Even 
with RAID battery backup for cache (or RAID cache disabled), having machine 
behind UPS, and starting clean shutdown when battery in UPS has less than [3 
minutes in my case, yours may be different] juice left is a good idea.


I hope, this helps.

Valeri

>> Valeri
>> 
>>> I just checked on Fedora 32.  It does not run raid-check at all, at least 
>>> not
>>> via a cron entry.  /usr/sbin/raid-check is available, though.  Is that an
>>> oversight?  (I started it manually now and will check if it's run once I 
>>> update
>>> to 33.)
> 
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Apologies - possible hardware problem?

2020-11-16 Thread Jeffrey Layton
Thanks everyone for the help! I'm still struggling to get it working. I
think I will have to go back and start simple: (1) one DIMM, (2) New PS,
(3) maybe new MB (I can't ever access the BIOS any more).

Jeff


On Mon, Nov 16, 2020 at 9:54 AM José María Terry Jiménez 
wrote:

> El 16/11/20 a las 10:03, hw escribió:
> > On Mon, 2020-11-16 at 09:58 +0100, hw wrote:
> >> [...]
> >> Put a minimal amount of RAM in and go through all of the modules to see
> if
> >> one or some of them are broken.
> > Replace all RAM or test it in another computer.
> >
> >> Replace the power supply.
> > Replace CPU or test it in another mainboard.
> >
> >> Replace mainboard.
> > If the board has a backup BIOS, use that to boot.
> >
> >
> > ___
>
> Hello
>
> He said (or i understand) it worked with Ubuntu until tried CentOS.
> Maybe something about UEFI? Don't know, but, just to try.
>
> Best
>
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
>
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] (C8) root on mdraid

2020-11-16 Thread John Pierce
On Mon, Nov 16, 2020, 2:29 AM Tony Mountifield  wrote:

>
> I thought it was much more usual to partition both disks to give sda1,2,3
> and sdb1,2,3, and then create /dev/md0 from sda1/sdb1, /dev/md1 from
> sda2/sdb3,
> and so on.
>

What I always did was to mdraid a single full disk partition then use lvm
for any file systems.Boot disks did need a separate /boot partition.

>
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] (C8) root on mdraid

2020-11-16 Thread Tony Mountifield
In article <20201115123245.db62b8248e1f248afe028...@lukaszposadowski.pl>,
Lukasz Posadowski  wrote:
> 
> Hello everyone. 
> 
> I'm trying to install CentOS 8 with root and swap partitions on
> software raid. The plan is:
> - create md0 raid level 1 with 2 hard drives: /dev/sda and /dev/sdb,
> using Linux Rscue CD,
> - install CentOS 8 with Virtual Box on my laptop,
> - rsync CentOS 8 root partition on /dev/md0p1,
> - chroot in CentOS 8 root partition,
> - configure /etc/mdadm.conf, grub.cfg, initramfs, install bootloader on
> both sda and sdb drives.
> 
> I think I can do first four of the above, but my CentOS installation
> acts strange after rebooting the server. It recognizes the raid, but
> boots randomly with root on /dev/sda1 (and recognizes raid
> with /dev/sdb disk), or with root on /dev/sdb1 (and recognizes raid
> with /dev/sda disk). When booting from Linux Rescue CD, the raid with
> two disk is recognized.

I thought it was much more usual to partition both disks to give sda1,2,3
and sdb1,2,3, and then create /dev/md0 from sda1/sdb1, /dev/md1 from sda2/sdb3,
and so on.

That's the way I have always done it, and have never had any problems.
Never seen an attempt to partition an md device before. In that case,
how would the kernel and initrd be found in order to assemble the RAID?

Cheers
Tony
-- 
Tony Mountifield
Work: t...@softins.co.uk - http://www.softins.co.uk
Play: t...@mountifield.org - http://tony.mountifield.org
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Apologies - possible hardware problem?

2020-11-16 Thread José María Terry Jiménez

El 16/11/20 a las 10:03, hw escribió:

On Mon, 2020-11-16 at 09:58 +0100, hw wrote:

[...]
Put a minimal amount of RAM in and go through all of the modules to see if
one or some of them are broken.

Replace all RAM or test it in another computer.


Replace the power supply.

Replace CPU or test it in another mainboard.


Replace mainboard.

If the board has a backup BIOS, use that to boot.


___


Hello

He said (or i understand) it worked with Ubuntu until tried CentOS. 
Maybe something about UEFI? Don't know, but, just to try.


Best

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Apologies - possible hardware problem?

2020-11-16 Thread hw
On Mon, 2020-11-16 at 09:58 +0100, hw wrote:
> 
> [...]
> Put a minimal amount of RAM in and go through all of the modules to see if
> one or some of them are broken.

Replace all RAM or test it in another computer.

> Replace the power supply.

Replace CPU or test it in another mainboard.

> Replace mainboard.

If the board has a backup BIOS, use that to boot.


___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Apologies - possible hardware problem?

2020-11-16 Thread hw
On Sun, 2020-11-15 at 18:54 +, Jeffrey Layton wrote:
> Good afternoon,
> 
> I have a home workstation with an AMD CPU, Titan V GPU, 32 GB of memory,
> and a root SSD and /home on spinning disks.
> 
> Right now it has xubuntu 18.04 on it and it would boot fine. I shut it down
> and restarted it to get an inventory before I put CentOS 8.2 on it. It
> won't boot now. It gets to the grub menu and freezes. I can't use the
> keyboard to select an item in the menu and I can't press enter to make it
> boot or press "e" to edit the boot line. It just sits there (seemingly
> forever). Here's what I've tred:
> 
> 1. New keyboard/mouse - no change
> 
> 2. Different monitor - no change
> 
> 3. Booting from the CentOS 8.2 iso on a USB stick - no change
> 
> 4. Replacing the TItan V card with a GT 1030 NV card - no change
> 
> When booting from a USB stick, I get the BIOS splash screen and press "DEL"
> to get to the menu, but the menu never shows up. It just freezes.
> 
> This one has me stumped. Not being able to boot from a USB stick is really
> puzzling. I've never seen that before. Possible bad MB?
> 
> My apologies for using the list to help debug problems but since I'm moving
> to CentOS 8.2 I thought people might have some ideas.

It's not clear to me if the keyboard is working or not.  I'd try a PS/2
keyboard if it doesn't and/or unplug and replug the USB one when the
computer is frozen.  What happens when you boot without a keyboard?

Put a minimal amount of RAM in and go through all of the modules to see if
one or some of them are broken.

Replace the power supply.

Replace mainboard.


___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] mdadm raid-check

2020-11-16 Thread hw
On Sat, 2020-11-14 at 21:55 -0600, Valeri Galtsev wrote:
> > On Nov 14, 2020, at 8:20 PM, hw  wrote:
> > 
> > 
> > Hi,
> > 
> > is it required to run /usr/sbin/raid-check once per week?  Centos 7 does
> > this.  Maybe it's sufficient to run it monthly?  IIRC Debian did it monthly.
> 
> On hardware RAIDs I do RAID verification once a week. Once a Month a
> not often enough in my book. That RAID verification effectively
> reads all stripes of all drives (and verifies that content of
> redundant drives is consistent), thus preventing a “time bomb”, when
> a drive left alone for too long, ready to fail in an area which is
> not accessed, and failing when at some point different drive was
> replaced and RAID rebuild has to go over all stripes of all
> drives. Such “multiple failures” are due to poor sysadmin’s work:
> not often enough RAID verification.

You mean there can be failures which can be detected during a
raid-check and can still be repaired using the other disk, but they
can be impossible to repair when a disk has failed?

> If software raid-check does the same, then it makes a lot of sense,
> and I am more with RedHat's weekly cron job, than with Debian’s
> Monthly.

How often do partial failures occur during normal operation?  In case
there was a power failure, it's probably a good idea to do a check
anyway.

> Valeri
> 
> > I just checked on Fedora 32.  It does not run raid-check at all, at least 
> > not
> > via a cron entry.  /usr/sbin/raid-check is available, though.  Is that an
> > oversight?  (I started it manually now and will check if it's run once I 
> > update
> > to 33.)

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos