Hey everyone,

I’m running into a strange problem after provisioning a node with xCAT, and
I’m trying to figure out if it’s something related to how I set up RAID1.

Setup:


   -

   Hardware: Supermicro server with AMD EPYC 7763 (64 cores)
   -

   OS: Oracle Linux 8.8 (kernel 4.18.0-477.21.1.el8_8)
   -

   Provisioning: xCAT
   -

   Storage:

   -

      2x 480GB SATA SSDs in RAID1 for system partitions
      -

      1x 1.8TB NVMe drive for /scratch

   -

   Filesystem: XFS
   -

   Network: Infiniband (ConnectX-5, switch: Mellanox SB8700/SB8790)

What`s happening:

After provisioning, the node (node01) looks fine — it boots, mounts
storage, RAID syncs, networking is working, etc.

But if I run simple commands like:

cat /proc/cpuinfo
cat /etc/fstab
cat /proc/mounts

vim /root/file_test

the *SSH session freezes*.

* (Other sessions are still fine, I can reconnect — it’s not a full system
crash.)

Other commands like:

cat /proc/mdstat
xfs_info /dev/md2
dmesg | grep error
dd if=/dev/sda of=/dev/null

work normally without any issues.
Wha I already cheched:


   -

   RAID1 is synced (/proc/mdstat shows [UU]).
   -

   XFS filesystems mount cleanly (xfs_info looks good).
   -

   No obvious errors in dmesg or journalctl.
   -

   Disk performance (dd) looks normal.
   -

   CPU microcode seems fine (0xa0011d5 for all cores).
   -

   Unloading Infiniband drivers (mlx5_ib, mlx5_core) had no effect.
   -

   strace shows the freeze while reading through /proc/cpuinfo.

Also important:

Other nodes (Dell servers with ConnectX-6) provisioned via the same xCAT
environment do not have this problem.

Could this be the cause?

*Could it be that something went wrong during the RAID1 creation with the
partitionfile script during provisioning?*

I created the RAID arrays (mdadm) during provisioning, plus a standalone
/scratch partition on the NVMe.

Thanks a lot if you have any ideas.

I’m happy to share more info if needed — just trying to understand if I
missed anything obvious.
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to