Re: [lustre-discuss] Fwd: ldiskfs vs zfs

2021-02-17 Thread Cameron Harr via lustre-discuss
Sudheendra, You will get varied answers depending on who you ask. For us, we are strong believers in ZFS. Of course, as a major contributor to ZFS-on-Linux, we're heavily biased, but we believe the management features (snapshots, etc) outweigh any performance deficiencies compared to

Re: [lustre-discuss] Missing OST's from 1 node only

2021-10-12 Thread Cameron Harr via lustre-discuss
I don't know the problem here, but you might want to look for connectivity issues from the client to the OSS(s) that house those two missing OSTs. I would image the lustre.log would show such errors in bulk. I've seen where an IB subnet manager gets in a weird state

Re: [lustre-discuss] lustre lts roadmap

2021-11-23 Thread Cameron Harr via lustre-discuss
Einar, It's easier to upgrade the Lustre version than your OS, IMO. So, if you want to run a RHEL/CentOS 8 derivative, then you may need to go with 2.14 on your servers for now and then upgrade to 2.15 (future LTS), once that becomes stable. That's our current plan. Cameron On 11/22/21

Re: [lustre-discuss] question regarding du vs df on lustre

2022-04-19 Thread Cameron Harr via lustre-discuss
One thing you can look at is running 'zpool iostat 1' (there are many options) to monitor that ZFS is still doing I/O during that time gap. With NVMe though, as Andreas said, I would expect that time gap to last seconds to minutes, not hours. On 4/19/22 02:16, Einar Næss Jensen wrote: Thank

Re: [lustre-discuss] Lustre 2.12.6 on RHEL 7.9 not able to mount disks after reboot

2022-08-09 Thread Cameron Harr via lustre-discuss
JC, The message where it asks if the MGS is running is a pretty common error that you'll see when something isn't right. There's not a lot of detail in your message but first step is to make sure your OST device is present on the OSS server. You mentioned remounting the RAID directories; is

Re: [lustre-discuss] BCP for High Availability?

2023-01-19 Thread Cameron Harr via lustre-discuss
We (LLNL) were probably that Lab using pacemaker-remote, and we still are as it generally works and is what we're used to. That said, on an upcoming system, we may end up trying 2-node HA clusters due to the vendor's preference. I'm not sure what specifics you're interested in, but as you

Re: [lustre-discuss] About Lustre small files performace(8k) improve

2023-03-27 Thread Cameron Harr via lustre-discuss
I'll assume here you're referring to MPI-IO, which is not really a "feature" but a way to perform parallel I/O using the message passing interface (MPI) stack. There can also be different interpretations of what MPI-IO exactly means: many HPC applications (and benchmarks such as IOR) use MPI

Re: [lustre-discuss] [EXTERNAL] [BULK] MDS hardware - NVME?

2024-01-10 Thread Cameron Harr via lustre-discuss
if if it loses too many devices. And, the striped mirrors may see better performance over Z2. Regards Thomas On 1/9/24 20:57, Cameron Harr via lustre-discuss wrote: Thomas, We value management over performance and have knowingly left performance on the floor in the name of standardization, robustness

Re: [lustre-discuss] [EXTERNAL] [BULK] MDS hardware - NVME?

2024-01-09 Thread Cameron Harr via lustre-discuss
? I'm currently doing some tests, and the results favor software raid, in particular when it comes to IOPS. Regards Thomas On 1/5/24 19:55, Cameron Harr via lustre-discuss wrote: This doesn't answer your question about ldiskfs on zvols, but we've been running MDTs on ZFS on NVMe in production

Re: [lustre-discuss] [EXTERNAL] [BULK] MDS hardware - NVME?

2024-01-05 Thread Cameron Harr via lustre-discuss
This doesn't answer your question about ldiskfs on zvols, but we've been running MDTs on ZFS on NVMe in production for a couple years (and on SAS SSDs for many years prior). Our current production MDTs using NVMe consist of one zpool/node made up of 3x 2-drive mirrors, but we've been