Tomer, have you tried following these instructions?
https://xcat-docs.readthedocs.io/en/stable/guides/admin-guides/manage_clusters/ppc64le/discovery/mtms/discovery_using_defined.html

From: Tomer Shachaf <tomers...@matrix.co.il>
Sent: Wednesday, April 12, 2023 8:24 AM
To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
Subject: Re: [xcat-user] [External] BitTorrent distribution of stateless images 
with xCAT interesting to anyone?

Can anybody help me with a guide to how to configure MTMS discovery or 
sequential discovery to nodes ? Thanks a lot בברכה , תומר שחף | מהנדס אינטגרציה 
ותשתיות | חטיבת אינטגרציה ותשתיות | מטריקס | נייד 054-2686841 | tomershac@ 
matrix. co. il |
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
Can anybody help me with a guide to how to configure MTMS discovery or 
sequential discovery to nodes ? Thanks a lot
בברכה ,

תומר שחף | מהנדס אינטגרציה ותשתיות | חטיבת אינטגרציה ותשתיות | מטריקס | נייד 
054-2686841 |
tomers...@matrix.co.il<mailto:tomers...@matrix.co.il> | 
www.matrix.co.il<http://www.matrix.co.il/>
[image001.jpg]



On 5 Apr 2023, at 16:37, Jarrod Johnson 
<jjohns...@lenovo.com<mailto:jjohns...@lenovo.com>> wrote:

On the not even root is allowed to make changes, there are mechanisms to at 
least partially get there.  However the nature of things is that many PCIe 
devices do not consider such a boundary.  Disabling KCS in a BMC is an option 
provided now.  Without that access, at least our platforms don't provide a path 
to UEFI or BMC firmware updates or configuration changes.  However most devices 
you may add would likely still be open to manipulation.  However, that 
manipulation is frequently bounded nowadays, with many components having a 
first stage firmware loader that refuses to continue if the firmware payload 
does not pass a signature check.

In so far as SecureBoot, it in practice does protect the kernel space, but user 
space is left uncovered (e.g. something like a malicious /etc/shadow is 
impossible to cover in the SecureBoot scheme).  The approach there would be 
trusted boot, with encrypted boot and sealing the encryption key to PCRs 
according to your desire for tamper detection and lockout.  In this case, you 
could for example have booting from a rescue disk result in the system being 
unable to decrypt the boot volume (you may optionally have a 'recovery' 
password in another slot to allow password based recovery).  Or if secureboot 
is disabled, it can't decrypt the boot volume.  In the confluent diskless boot, 
it extends one of the PCRs so that not even root on the system can retrieve the 
API key from TPM. The PCRs cover things like firmware loads on boot components, 
so while you may not prevent a firmware change, you may be able to render an 
attacked device unable to read boot volume. This one is tricky ground, as you 
are balancing protecting the data against attacks versus accidentally locking 
yourself out through an intended, innocuous change.
________________________________
From: Dr. Thomas Orgis 
<thomas.or...@uni-hamburg.de<mailto:thomas.or...@uni-hamburg.de>>
Sent: Wednesday, April 5, 2023 9:10 AM
To: xCAT Users Mailing list 
<xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net>>
Subject: Re: [xcat-user] [External] BitTorrent distribution of stateless images 
with xCAT interesting to anyone?

Some details on our setup in light of the approach Jarrod outlined …

Am Wed, 29 Mar 2023 17:37:30 +0000
schrieb Jarrod Johnson <jjohns...@lenovo.com<mailto:jjohns...@lenovo.com>>:

> On confluent diskless, there is an interesting benefit that becomes a
> challenge for bittorrent: a typical diskless node never downloads the
> whole diskless image.  This means less ram sucked up by the diskless
> image, and also that the diskless image can be large without pruning.

I guess this is mitigated by our OS image being rather minimal to begin
with. It only has the basic system software and drivers, up to a
working C/C++ compiler setup that is able to bootstrap further software.

Such further software is provided in a versioned tree via NFS and
managed via environment modules. So such an approach to optimize the
usage of a large OS image by only keeping necessary parts in memory
would not benefit us much. The squashfs is below 1G, which, for our
compute nodes with 64G of RAM is no big deal. A full image of 10G would
be annoying.

Rather, the split for us is this below 1G system image and the software
tree on NFS with 421G, grown over about 8 years system lifetime. Add to
that an uncounted number of anaconda, spack, whatever trees that users
installed into their storage shares.

Getting whole images quickly out to the cluster nodes is very much
valid for this scenario also for the next system we will set up. Of
course one could imagine full-on NFS root, but there are reasons why
that is out of fashion, and with a minimal main system image, it can be
considered as a mode of aggressive client caching.

It might not matter much with 10G network or IB on the image server,
but any avoidable bottleneck sucks, even if it does not hurt right now
in practice.

> trick were done to only torrent the parts as needed locally

That does sound like a complexity nightmare … but it might still
provide benefit, assuming that nodes need the same parts, mostly. You'd
need to do a lot of work to integrate those layers. Not worth it, I
guess.

> the diskless images are now encrypted […] by node TPM

Hm. Use of TPMs on cluster nodes. Didn't think about that much, yet.
Another point: I'd love vendors to finally implement safeguards to
ensure that root on a server cannot manipulate any firmware (and be it
network card, hard disk) from userspace, and especially cannot access
the BMC, which should only answer to external IPMI requests. Can Secure
Boot really ensure nothing has been messed with through a root exploit?
I'd love a simple switch that only allows certain platform changes in
the pre-boot environment (BIOS, UEFI … and IPMI from the outside) and
have things locked down once the Kernel boots.

I still don't see how you really can trust a machine once someone had
root on it, if you're really paranoid.

The whole machinery of crypto checking (Secure Boot) is a rather
elaborate mess which could be avoided if there was a clear hardware
barrier that only allows certain modifications (also to PCIe and SATA
devices, at least onboard devices) outside the booted Linux context. If
there's no way to modify things for rogue users/hackers, then you know
the system is clean on a fresh boot from the network, and maybe after
replacing any SATA or USB devices that just cannot be protected that way.

Is any vendor for compute nodes offering this kind of manipulation
protection?

I'd love that kind of security to start with. Not having to
theoretically trash the hardware once someone possibly got a root
exploit. Then talk about encrypting images and securing userspace …

> if [ "untethered" = "$(getarg confluent_imagemethod)" ]; then
>     mount -t tmpfs untethered /mnt/remoteimg
>     curl 
> https://$confluent_whost/confluent-public/os/$confluent_profile/rootimg.sfs<https://$confluent_whost/confluent-public/os/$confluent_profile/rootimg.sfs>
>  -o /mnt/remoteimg/rootimg.sfs
> else
>     confluent_urls="$confluent_urls 
> https://$confluent_whost/confluent-public/os/$confluent_profile/rootimg.sfs";<https://$confluent_whost/confluent-public/os/$confluent_profile/rootimg.sfs%22>
>     /opt/confluent/bin/urlmount $confluent_urls /mnt/remoteimg
> fi

Looks easy enough.

> Is the logic for getting the image.  One thing to note is that a
> typical diskless image boot in confluent, the booted system does not​
> see rootimg.sfs, so the torrent execution would have to stay in the
> 'initramfs' world (which does persist after boot, as a separate mount
> namespace)

I think such is why I hooked the rootimg.sfs up to /dev/loop0 back in
the day, and hacked ctorret to allow the block device as data source.
The loop device stays accessible.

Anyone from xCAT with thoughts on this? Should I work on a patch for
current xCAT (not sure where I'd find time to test that, though)?

I don't know which kind of cluster management our next system will
have. It could be that my path of least resistance is a quick hack on
that one like I did with xCAT back in 2015 …


Alrighty then,

Thomas

--
Dr. Thomas Orgis
HPC @ Universität Hamburg


_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net<mailto:xCAT-user@lists.sourceforge.net>
https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7Ccbea0bc18f4f457ff4ce08db35d73dde%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638162970815470969%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=IOFMoTdJqA89nsJyMD6ilzWVQH93beZhpXccAFJtERE%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user>
זהירות: מקור הדואל הזה הוא מחוץ למטריקס. חל איסור ללחוץ על קישורים או לפתוח 
קבצים מצורפים אלא אם כן השולח מוכר והתוכן בטוח
Caution: The source of this email is from outside Matrix. it is forbidden to 
click on links or open attachments unless you recognize the sender and know the 
content is safe.
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net<mailto:xCAT-user@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/xcat-user<https://lists.sourceforge.net/lists/listinfo/xcat-user>
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to