Re: [xcat-user] OTHERPKGS Issue

2022-01-04 Thread Mark Gurevich



Yes, odd that that "yum repolist" show the status of the xCAT repos to be
"enabled: 0" although in the .repo files it is "enabled=1"
Have you tried "yum clean all" ?

Mark Gurevich
Poughkeepsie Development Lab
HPC Software Development - xCAT

"If we knew what it was we were doing, it would not be called research,
would it?"
--Albert Einstein





From:   "Mesbah Mohamady" 
To: "xCAT Users Mailing list" 
Date:   12/29/2021 11:47 AM
Subject:[EXTERNAL] Re: [xcat-user] OTHERPKGS Issue



Yes, yum is installed:

[root@xcat-comp08 ~]# rpm -qa |  grep  -i yum
yum-plugin-fastestmirror-1.1.31-53.el7.noarch
yum-metadata-parser-1.1.4-10.el7.x86_64
yum-3.4.3-167.el7.centos.noarch


When trying to install manually on the compute node, the yum command fails
due to trying to fetch the CentOS internet repositories while there is no
internet access on the compute node:
[root@xcat-comp08 ~]# yum repolist
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
Could not retrieve mirrorlist
http://mirrorlist.centos.org/?release=7=x86_64=os=stock
error was
14: curl#7 - "Failed to connect to 2604:1380:1001:6c00::1: Network is
unreachable"
Loading mirror speeds from cached hostfile
Loading mirror speeds from cached hostfile
Loading mirror speeds from cached hostfile
Loading mirror speeds from cached hostfile
Loading mirror speeds from cached hostfile
repo id
repo name
status
base/7/x86_64
CentOS-7 - Base
0
extras/7/x86_64
CentOS-7 - Extras
0
updates/7/x86_64
CentOS-7 - Updates
0
xCAT-centos7.8-path0
xCAT-centos7.8-path0
0
xcat-otherpkgs0
xcat-otherpkgs0
0
repolist: 0
[root@xcat-comp08 ~]# yum install vim -y
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
Could not retrieve mirrorlist
http://mirrorlist.centos.org/?release=7=x86_64=os=stock
error was
14: curl#7 - "Failed to connect to 2604:1380:2001:d00::3: Network is
unreachable"


 One of the configured repositories failed (Unknown),
 and yum doesn't have enough cached data to continue. At this point the
only
 safe thing yum can do is fail. There are a few ways to work "fix" this:

     1. Contact the upstream for the repository and get them to fix the
problem.

     2. Reconfigure the baseurl/etc. for the repository, to point to a
working
        upstream. This is most often useful if you are using a newer
        distribution release than is supported by the repository (and the
        packages for the previous distribution release still work).

     3. Run the command with the repository temporarily disabled
            yum --disablerepo= ...

     4. Disable the repository permanently, so yum won't use it by default.
Yum
        will then just ignore the repository until you permanently enable
it
        again or use --enablerepo for temporary usage:

            yum-config-manager --disable 
        or
            subscription-manager repos --disable=

     5. Configure the failing repository to be skipped, if it is
unavailable.
        Note that yum will try to contact the repo. when it runs most
commands,
        so will have to try and fail each time (and thus. yum will be be
much
        slower). If it is a very temporary problem though, this is often a
nice
        compromise:

            yum-config-manager --save
--setopt=.skip_if_unavailable=true

Cannot find a valid baseurl for repo: base/7/x86_64



Also it is strange that "yum repolist" show the status of the xCAT repos to
be "enabled: 0" although in the .repo files it is "enabled=1"

[root@xcat-comp08 ~]# cat /etc/yum.repos.d/xCAT-otherpkgs0.repo
[xcat-otherpkgs0]
name=xcat-otherpkgs0
baseurl=http://192.168.14.231:80/install/post/otherpkgs/centos7.8/x86_64/
enabled=1
gpgcheck=0
skip_if_unavailable=True
[root@xcat-comp08 ~]# cat /etc/yum.repos.d/xCAT-centos7.8-path0.repo
[xCAT-centos7.8-path0]
name=xCAT-centos7.8-path0
baseurl=http://192.168.14.231:80/install/centos7.8/x86_64
enabled=1
gpgcheck=0
skip_if_unavailable=True

On Wed, Dec 29, 2021 at 3:06 AM  wrote:
  From the log entry, yum does not appear to be installed.

  Have you verified on a node with rpm -qa|grep yum

  On Tue, Dec 28, 2021 at 4:47 PM Mesbah Mohamady  wrote:
   Dears,

   I have an issue when configuring a directory for "otherpkgs" on the
   management node to be used by compte nodes to download new RPMs:

   I have used the below approach:

   - On management node:
   # mkdir -p /install/post/otherpkgs/centos7.8/x86_64
   # cd /install/post/otherpkgs/centos7.8/x86_64
   # yumdownloader traceroute.x86_64 --resolve
   # createrepo .
   # vim otherpkg.pkglist
      traceroute
   # chdef -t osimage centos7.8-x86_64-install-compute
   otherpkglist=/install/post/otherpkgs/centos7.8/x86_64/otherpkg.pkglist
   # lsdef -t osimage centos7.8-x86_64-install-compute
      Object name: centos7.8-x86_64-install-compute
       imagetype=linux
       osarch=x86_64
       osdistroname=centos7.8-x86_64
       osname=Linux
       osvers=centos7.8
       

Re: [xcat-user] How to modify list of modules to add to initrd?

2022-01-04 Thread Mark Gurevich


"Failed to find module" messages are probably from "sub mkinitrd()" in
/opt/xcat/share/xcat/netboot/rh/genimage

Mark Gurevich
Poughkeepsie Development Lab
HPC Software Development - xCAT

"If we knew what it was we were doing, it would not be called research,
would it?"
--Albert Einstein





From:   "Hannum, Keith" 
To: "xcat-user@lists.sourceforge.net"

Date:   01/04/2022 02:37 PM
Subject:[EXTERNAL] [xcat-user] How to modify list of modules to add to
initrd?



I want to build my diskless nodes with FIPS enabled, and in rhel8 this
requires building FIPS into the initrd. I can do this with a custom dracut
config, but this file needs to be in the rootimg chdir before the initrd is
generated by genimage, which isn’t possible on a clean image creation. I am
also getting errors with ext3 and nfs on my rhel 8.4 MN. I’d like to clean
this list up and add FIPS so on the initial creation of the image,
everything is loaded correctly. How is this list created and maintained?

Added net_failover.ko as an autodetected dependency
Added failover.ko as an autodetected dependency
Added mlx4_core.ko as an autodetected dependency
Added i2c-algo-bit.ko as an autodetected dependency
Added dca.ko as an autodetected dependency
Added mdio.ko as an autodetected dependency
Enter the dracut mode. Dracut version: 049. Dracut directory: dracut_047.
Try to load drivers: failover mdio dca i2c-algo-bit mlx4_core net_failover
tg3 bnx2 bnx2x e1000 e1000e igb mlx4_en virtio_net be2net ext3 ext4 to
initrd.
chroot /install/netboot/rhawk_bjh/rootimg dracut  -N --compress "/bin/pigz
-p 16 " -f /tmp/initrd.653785.gz 5.10.59-rt52-RedHawk-8.4-trace
dracut: No '/dev/log' or 'logger' included for syslog logging
dracut-install: Failed to find module 'ext3'
dracut: FAILED:  /usr/lib/dracut/dracut-install
-D /var/tmp/dracut.bwzPzt/initramfs
--kerneldir /lib/modules/5.10.59-rt52-RedHawk-8.4-trace/ -m failover mdio
dca i2c_algo-bit mlx4_core net_failover tg3 bnx2 bnx2x e1000 e1000e igb
mlx4_en virtio_net be2net ext3 ext4
the initial ramdisk for stateless is generated successfully.
Try to load drivers: failover mdio dca i2c-algo-bit mlx4_core net_failover
tg3 bnx2 bnx2x e1000 e1000e igb mlx4_en virtio_net be2net ext3 ext4 to
initrd.
chroot /install/netboot/rhawk_bjh/rootimg dracut  -N --compress "/bin/pigz
-p 16 " -f /tmp/initrd.653785.gz 5.10.59-rt52-RedHawk-8.4-trace
dracut: No '/dev/log' or 'logger' included for syslog logging
dracut-install: Failed to find module 'ext3'
dracut: FAILED:  /usr/lib/dracut/dracut-install
-D /var/tmp/dracut.xkPwa8/initramfs
--kerneldir /lib/modules/5.10.59-rt52-RedHawk-8.4-trace/ -m failover mdio
dca i2c_algo-bit mlx4_core net_failover tg3 bnx2 bnx2x e1000 e1000e igb
mlx4_en virtio_net be2net ext3 ext4
dracut-install: ERROR: installing 'nfs'
dracut: FAILED:  /usr/lib/dracut/dracut-install
-D /var/tmp/dracut.xkPwa8/initramfs
--kerneldir /lib/modules/5.10.59-rt52-RedHawk-8.4-trace/ -m nfs
the initial ramdisk for statelite is generated successfully.




-Keith

Keith Hannum
Lockheed Martin
keith.han...@lmco.com
 ___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user



___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] How to modify list of modules to add to initrd?

2022-01-04 Thread Hannum, Keith
I want to build my diskless nodes with FIPS enabled, and in rhel8 this requires 
building FIPS into the initrd. I can do this with a custom dracut config, but 
this file needs to be in the rootimg chdir before the initrd is generated by 
genimage, which isn't possible on a clean image creation. I am also getting 
errors with ext3 and nfs on my rhel 8.4 MN. I'd like to clean this list up and 
add FIPS so on the initial creation of the image, everything is loaded 
correctly. How is this list created and maintained?

Added net_failover.ko as an autodetected dependency
Added failover.ko as an autodetected dependency
Added mlx4_core.ko as an autodetected dependency
Added i2c-algo-bit.ko as an autodetected dependency
Added dca.ko as an autodetected dependency
Added mdio.ko as an autodetected dependency
Enter the dracut mode. Dracut version: 049. Dracut directory: dracut_047.
Try to load drivers: failover mdio dca i2c-algo-bit mlx4_core net_failover tg3 
bnx2 bnx2x e1000 e1000e igb mlx4_en virtio_net be2net ext3 ext4 to initrd.
chroot /install/netboot/rhawk_bjh/rootimg dracut  -N --compress "/bin/pigz -p 
16 " -f /tmp/initrd.653785.gz 5.10.59-rt52-RedHawk-8.4-trace
dracut: No '/dev/log' or 'logger' included for syslog logging
dracut-install: Failed to find module 'ext3'
dracut: FAILED:  /usr/lib/dracut/dracut-install -D 
/var/tmp/dracut.bwzPzt/initramfs --kerneldir 
/lib/modules/5.10.59-rt52-RedHawk-8.4-trace/ -m failover mdio dca i2c_algo-bit 
mlx4_core net_failover tg3 bnx2 bnx2x e1000 e1000e igb mlx4_en virtio_net 
be2net ext3 ext4
the initial ramdisk for stateless is generated successfully.
Try to load drivers: failover mdio dca i2c-algo-bit mlx4_core net_failover tg3 
bnx2 bnx2x e1000 e1000e igb mlx4_en virtio_net be2net ext3 ext4 to initrd.
chroot /install/netboot/rhawk_bjh/rootimg dracut  -N --compress "/bin/pigz -p 
16 " -f /tmp/initrd.653785.gz 5.10.59-rt52-RedHawk-8.4-trace
dracut: No '/dev/log' or 'logger' included for syslog logging
dracut-install: Failed to find module 'ext3'
dracut: FAILED:  /usr/lib/dracut/dracut-install -D 
/var/tmp/dracut.xkPwa8/initramfs --kerneldir 
/lib/modules/5.10.59-rt52-RedHawk-8.4-trace/ -m failover mdio dca i2c_algo-bit 
mlx4_core net_failover tg3 bnx2 bnx2x e1000 e1000e igb mlx4_en virtio_net 
be2net ext3 ext4
dracut-install: ERROR: installing 'nfs'
dracut: FAILED:  /usr/lib/dracut/dracut-install -D 
/var/tmp/dracut.xkPwa8/initramfs --kerneldir 
/lib/modules/5.10.59-rt52-RedHawk-8.4-trace/ -m nfs
the initial ramdisk for statelite is generated successfully.




-Keith

Keith Hannum
Lockheed Martin
keith.han...@lmco.com

___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] ib.rhels8.x86_64.pkglist

2022-01-04 Thread Thomas HUMMEL

Hello,

let me share our experience with MOFED/RHEL8/xCAT:

Actually we don't use xCAT to install MOFED but ansible afterwards, 
either in genimage generated chroot or on initially xCAT installed 
stateful nodes


Note that MOFED don't only support kernel point releases. It may out of 
the box, with no need to rebuild, support errata kernel thanks to the 
KMP/Weak modules mechanism. The keywork here is MAY. As a matter of 
fact, as they state themselves it MAY also break :


Quoting their Overview:

"MLNX_OFED package for RedHat comes with RPMs that support KMP 
(weak-modules), meaning that when a new errata kernel is installed, 
compatibility links will be created under the weak-updates directory for 
the new kernel. Those links allow using the existing MLNX_OFED kernel 
modules without the need for recompilation. However, at times, the ABI 
of the new kernel may not be compatible with the MLNX_OFED modules, 
which will prevent loading them. In this case, the MLNX_OFED modules 
must be rebuilt against the new kernel."


We experienced both cases indeed.

Note: their initial RPMS come with KMP enable, I think you have to use 
--kmp if you want to pass it along your custom builds


Their distribution provides and uses if rebuild is asked the 
kmp_compat.sh which, through depmod can detect kernel incompatibility 
(and not install weak-modules)


However still a dnf upgrade of kernel may result in a broken system (if 
ABI was broken) which is why MOFED is a pain in the ass.


Actual solution, which we mentionned several time to them but I don't 
think they plan to implement either would be:


a) some dkms like mechanism (as for nvidia drivers - same company !)
b) at least some yum plugin preventing to upgrading to non compatible kernel

We are thinking about something like, in the stateless case, making our 
ansible playbook crash based on no extra nor weak-updates directory, 
sign of broken compatibility. So at least we won't boot to a broken 
image. Still the stateful case would have to be managed as well.


Note that we also encountered buggy RPM building when rebuilding MOFED 
rpms with add_kernel_support option with older (5.1) MOFED package as it 
introduced a dependency loop with rdma-core being obsoleted by 
mlnx-ofed-all and mlnx-ofed-all-useronly packages but needed by them
This was fixed by them by simply replacing 
create_mlnx_ofed_installers.pl and mlnx_add_kernel_support.sh scripts 
from MOFED 5.5


Finally, we install (with ansible but it may be used in your pkglist) 
the following packages :


  - mlnx-ofed-basic
   - ibutils2
   - qperf

Hope this helps

--
Thomas HUMMEL


___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] ib.rhels8.x86_64.pkglist

2022-01-04 Thread Jon Diprose
Whilst not a direct answer to your question, I have stopped using a postscript 
for this task and instead create a local repository containing the OFED rpms 
and just use the regular otherpkgs mechanism. This is largely because I run a 
mixed CX3/CX4/CX5/CX6 HBA environment and the CX3 cards require staying on the 
LTS OFED release but I want the newer OFED where possible. Managing that just 
got too messy with the script.
Mellanox OFED being what it is, its obviously not that simple. See the 
following Mellanox links, though note I don't use their repo because they only 
build for OS pointrelease baseline kernels:

https://www.mellanox.com/support/mlnx-ofed-public-repository
https://www.mellanox.com/related-docs/prod_software/Using_MLNX_OFED_Repository.pdf

Local repo setup instructions from my notes:

===5.4-1.0.3.0===

ofedver="5.4-1.0.3.0"

# NB CentOS specific but easy to adapt for other OSes - or just hard code 
pointreleasever="$( sed 's/CentOS Linux release \([78].[0-9][0-9]*\).*$/\1/' 
/etc/redhat-release )"

ofed="MLNX_OFED_LINUX-${ofedver}-rhel${pointreleasever}-x86_64"

wget "https://content.mellanox.com/ofed/MLNX_OFED-${ofedver}/${ofed}.tgz;
tar xf ${ofed}.tgz

# If adding kernel support is required:
# As root on buildbox
# - must have the desired kernel version installed
# - the following yum install adds missing dependencies for me but omits 
several already installed in my images
yum -y install libtool
${ofed}/mlnx_add_kernel_support.sh -m ${ofed} --make-tgz --kmp --yes -v
mv "/tmp/${ofed}-ext.tgz" ./

# On a box with the repo filestore mounted
mkdir -p "/mnt/repo/mellanox/mlnx_ofed/${ofedver}/$(uname -r)"
tar -C "/mnt/repo/mellanox/mlnx_ofed/${ofedver}/$(uname -r)" --strip 1 -xf 
"${ofed}-ext.tgz" "${ofed}-ext/RPMS"

# Shorter alternative if adding kernel support is not required (pointrelease 
baseline kernel only)
wget 'https://content.mellanox.com/ofed/MLNX_OFED-${ofedver}/${ofed}.tgz'
mkdir -p "/mnt/repo/mellanox/mlnx_ofed/${ofedver}/$(uname -r)"
tar -C "/mnt/repo/mellanox/mlnx_ofed/${ofedver}/$(uname -r)" --strip 2 -xf 
"${ofed}.tgz" "./${ofed}/RPMS"

===

In our mixed CX3/CX4+ environment I need to combine this with a bit of yum & 
xcat trickery to get the correct LTS/non-LTS OFED version for the HBA and the 
build for the correct kernel. I have an rpm that causes two files to be created 
under /etc/yum/vars - 'mlnxofed' containing the OFED version number I want 
given what HBA is in the box and 'uname_r' being the output of `/usr/bin/rpm -q 
kernel --last | head -1 | sed 's/^kernel-\([^ ]*\).*$/\1/'`, both derived at 
rpm installation time (the latter might even work for genimage - I'm diskful so 
haven't checked). The '/mnt/repo/mellanox/mlnx_ofed' directory used above is 
presented under the configured otherpkgdir for the osimage and my 
otherpkgs.pkglist then includes:

rescomp-combined/bmrc-mlnxofed-yum-vars
#NEW_INSTALL_LIST#
mlnx_ofed/$mlnxofed/$uname_r/RPMS/mlnx-ofed-all.noarch
mlnx_ofed/$mlnxofed/$uname_r/RPMS/mlnx-fw-updater 

The '#NEW_INSTALL_LIST#' ensures the yum vars provided by 
bmrc-mlnxofed-yum-vars are present before the next two lines get interpreted by 
yum. I could't tell you if anything other than yum-based package management is 
happy to do variable substitution on the repo path, or how it would be 
configured if it did. A subsequent '#NEW_INSTALL_LIST#' might be advisable if 
you have other packages to install that depend on OFED being first installed.
A bit of a pain to set up, but it means I don't have to do an OFED build on 
every node, I get to inspect and test the build before it gets rolled out, I 
get better control over which OFED I am installing and I can do a 
kernal-and-OFED upgrade with 'updatenode compute -S'. OS kernel updates are 
still a royal pain as new OFED repos need building and the 
bmrc-mlnxofed-yum-vars rpm updating first and if I need to do an OFED downgrade 
I need to do an explicit ofed_uninstall before the updatenode call - but less 
crazy than it was with the script. I need a new gpfs portability layer for a 
kernel update so its not a step taken lightly anyway.
If your hardware is less diverse (or even just less ancient) you probably 
wouldn't need the fancy yum vars and could just hardcode the OFED and kernel 
versions in the included bit of otherpkgs.pkglist and update those values as 
necessary. For genimage, the rpm install isn't being run on the box the image 
will run on, so detecting the HBA at rpm install time is inappropriate anyway - 
as is installing the mlnx-fw-updater rpm, as that actually does the firmware 
update as part of the package install!
Just don't forget the mlnx_add_kernel_support.sh step if you do kernel updates 
between point releases. There's something about that in the mlnxofed_ib_install 
script for diskful installs but I don't see it for diskless.
Happy New Year,
Jon

--
Dr. Jonathan Diprose  Tel: 01865 287873
Research Computing Manager
Henry Wellcome Building for Genomic Medicine

[xcat-user] ib.rhels8.x86_64.pkglist

2022-01-04 Thread Vinícius Ferrão via xCAT-user
Hello,

I'm trying to use the OFED install script from xCAT but some template files 
seems to be missing, the first one is the one in the title:

/opt/xcat/share/xcat/ib/netboot/rh/ib.rhels8.x86_64.pkglist 

Does anyone have a list of the packages required? EL8 and x86_64?

Thanks all and happy new year.

PS: Anything else that I should be aware with mlnxofed_ib_install script?



___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user