Tomer, have you tried following these instructions? https://xcat-docs.readthedocs.io/en/stable/guides/admin-guides/manage_clusters/ppc64le/discovery/mtms/discovery_using_defined.html
From: Tomer Shachaf <tomers...@matrix.co.il> Sent: Wednesday, April 12, 2023 8:24 AM To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> Subject: Re: [xcat-user] [External] BitTorrent distribution of stateless images with xCAT interesting to anyone? Can anybody help me with a guide to how to configure MTMS discovery or sequential discovery to nodes ? Thanks a lot בברכה , תומר שחף | מהנדס אינטגרציה ותשתיות | חטיבת אינטגרציה ותשתיות | מטריקס | נייד 054-2686841 | tomershac@ matrix. co. il | ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Can anybody help me with a guide to how to configure MTMS discovery or sequential discovery to nodes ? Thanks a lot בברכה , תומר שחף | מהנדס אינטגרציה ותשתיות | חטיבת אינטגרציה ותשתיות | מטריקס | נייד 054-2686841 | tomers...@matrix.co.il<mailto:tomers...@matrix.co.il> | www.matrix.co.il<http://www.matrix.co.il/> [image001.jpg] On 5 Apr 2023, at 16:37, Jarrod Johnson <jjohns...@lenovo.com<mailto:jjohns...@lenovo.com>> wrote: On the not even root is allowed to make changes, there are mechanisms to at least partially get there. However the nature of things is that many PCIe devices do not consider such a boundary. Disabling KCS in a BMC is an option provided now. Without that access, at least our platforms don't provide a path to UEFI or BMC firmware updates or configuration changes. However most devices you may add would likely still be open to manipulation. However, that manipulation is frequently bounded nowadays, with many components having a first stage firmware loader that refuses to continue if the firmware payload does not pass a signature check. In so far as SecureBoot, it in practice does protect the kernel space, but user space is left uncovered (e.g. something like a malicious /etc/shadow is impossible to cover in the SecureBoot scheme). The approach there would be trusted boot, with encrypted boot and sealing the encryption key to PCRs according to your desire for tamper detection and lockout. In this case, you could for example have booting from a rescue disk result in the system being unable to decrypt the boot volume (you may optionally have a 'recovery' password in another slot to allow password based recovery). Or if secureboot is disabled, it can't decrypt the boot volume. In the confluent diskless boot, it extends one of the PCRs so that not even root on the system can retrieve the API key from TPM. The PCRs cover things like firmware loads on boot components, so while you may not prevent a firmware change, you may be able to render an attacked device unable to read boot volume. This one is tricky ground, as you are balancing protecting the data against attacks versus accidentally locking yourself out through an intended, innocuous change. ________________________________ From: Dr. Thomas Orgis <thomas.or...@uni-hamburg.de<mailto:thomas.or...@uni-hamburg.de>> Sent: Wednesday, April 5, 2023 9:10 AM To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net>> Subject: Re: [xcat-user] [External] BitTorrent distribution of stateless images with xCAT interesting to anyone? Some details on our setup in light of the approach Jarrod outlined … Am Wed, 29 Mar 2023 17:37:30 +0000 schrieb Jarrod Johnson <jjohns...@lenovo.com<mailto:jjohns...@lenovo.com>>: > On confluent diskless, there is an interesting benefit that becomes a > challenge for bittorrent: a typical diskless node never downloads the > whole diskless image. This means less ram sucked up by the diskless > image, and also that the diskless image can be large without pruning. I guess this is mitigated by our OS image being rather minimal to begin with. It only has the basic system software and drivers, up to a working C/C++ compiler setup that is able to bootstrap further software. Such further software is provided in a versioned tree via NFS and managed via environment modules. So such an approach to optimize the usage of a large OS image by only keeping necessary parts in memory would not benefit us much. The squashfs is below 1G, which, for our compute nodes with 64G of RAM is no big deal. A full image of 10G would be annoying. Rather, the split for us is this below 1G system image and the software tree on NFS with 421G, grown over about 8 years system lifetime. Add to that an uncounted number of anaconda, spack, whatever trees that users installed into their storage shares. Getting whole images quickly out to the cluster nodes is very much valid for this scenario also for the next system we will set up. Of course one could imagine full-on NFS root, but there are reasons why that is out of fashion, and with a minimal main system image, it can be considered as a mode of aggressive client caching. It might not matter much with 10G network or IB on the image server, but any avoidable bottleneck sucks, even if it does not hurt right now in practice. > trick were done to only torrent the parts as needed locally That does sound like a complexity nightmare … but it might still provide benefit, assuming that nodes need the same parts, mostly. You'd need to do a lot of work to integrate those layers. Not worth it, I guess. > the diskless images are now encrypted […] by node TPM Hm. Use of TPMs on cluster nodes. Didn't think about that much, yet. Another point: I'd love vendors to finally implement safeguards to ensure that root on a server cannot manipulate any firmware (and be it network card, hard disk) from userspace, and especially cannot access the BMC, which should only answer to external IPMI requests. Can Secure Boot really ensure nothing has been messed with through a root exploit? I'd love a simple switch that only allows certain platform changes in the pre-boot environment (BIOS, UEFI … and IPMI from the outside) and have things locked down once the Kernel boots. I still don't see how you really can trust a machine once someone had root on it, if you're really paranoid. The whole machinery of crypto checking (Secure Boot) is a rather elaborate mess which could be avoided if there was a clear hardware barrier that only allows certain modifications (also to PCIe and SATA devices, at least onboard devices) outside the booted Linux context. If there's no way to modify things for rogue users/hackers, then you know the system is clean on a fresh boot from the network, and maybe after replacing any SATA or USB devices that just cannot be protected that way. Is any vendor for compute nodes offering this kind of manipulation protection? I'd love that kind of security to start with. Not having to theoretically trash the hardware once someone possibly got a root exploit. Then talk about encrypting images and securing userspace … > if [ "untethered" = "$(getarg confluent_imagemethod)" ]; then > mount -t tmpfs untethered /mnt/remoteimg > curl > https://$confluent_whost/confluent-public/os/$confluent_profile/rootimg.sfs<https://$confluent_whost/confluent-public/os/$confluent_profile/rootimg.sfs> > -o /mnt/remoteimg/rootimg.sfs > else > confluent_urls="$confluent_urls > https://$confluent_whost/confluent-public/os/$confluent_profile/rootimg.sfs"<https://$confluent_whost/confluent-public/os/$confluent_profile/rootimg.sfs%22> > /opt/confluent/bin/urlmount $confluent_urls /mnt/remoteimg > fi Looks easy enough. > Is the logic for getting the image. One thing to note is that a > typical diskless image boot in confluent, the booted system does not > see rootimg.sfs, so the torrent execution would have to stay in the > 'initramfs' world (which does persist after boot, as a separate mount > namespace) I think such is why I hooked the rootimg.sfs up to /dev/loop0 back in the day, and hacked ctorret to allow the block device as data source. The loop device stays accessible. Anyone from xCAT with thoughts on this? Should I work on a patch for current xCAT (not sure where I'd find time to test that, though)? I don't know which kind of cluster management our next system will have. It could be that my path of least resistance is a quick hack on that one like I did with xCAT back in 2015 … Alrighty then, Thomas -- Dr. Thomas Orgis HPC @ Universität Hamburg _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net<mailto:xCAT-user@lists.sourceforge.net> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7Ccbea0bc18f4f457ff4ce08db35d73dde%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638162970815470969%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=IOFMoTdJqA89nsJyMD6ilzWVQH93beZhpXccAFJtERE%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user> זהירות: מקור הדואל הזה הוא מחוץ למטריקס. חל איסור ללחוץ על קישורים או לפתוח קבצים מצורפים אלא אם כן השולח מוכר והתוכן בטוח Caution: The source of this email is from outside Matrix. it is forbidden to click on links or open attachments unless you recognize the sender and know the content is safe. _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net<mailto:xCAT-user@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/xcat-user<https://lists.sourceforge.net/lists/listinfo/xcat-user>
_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user