Re: [lustre-discuss] Corrupted? MDT not mounting

2022-05-10 Thread Andrew Elwell via lustre-discuss
On Wed, 11 May 2022 at 04:37, Laura Hild wrote: > The non-dummy SRP module is in the kmod-srp package, which isn't included in > the Lustre repository... Thanks Laura, Yeah, I realised that earlier in the week, and have rebuilt the srp module from source via mlnxofedinstall, and sure enough

Re: [lustre-discuss] Corrupted? MDT not mounting

2022-05-10 Thread Laura Hild via lustre-discuss
Hi Andrew- The non-dummy SRP module is in the kmod-srp package, which isn't included in the Lustre repository. I'm less certain than I'd like to be, as ours is a DKMS setup rather than kmod, and the last time I had an SRP setup was a couple years ago, but I suspect you may have success if you

Re: [lustre-discuss] Corrupted? MDT not mounting

2022-05-08 Thread Andrew Elwell via lustre-discuss
On Fri, 6 May 2022 at 20:04, Andreas Dilger wrote: > MOFED is usually preferred over in-kernel OFED, it is just tested and fixed a > lot more. Fair enough, However is the 2.12.8-ib tree built with all the features? specifically

Re: [lustre-discuss] Corrupted? MDT not mounting

2022-05-06 Thread Laura Hild via lustre-discuss
Absolutely try MOFED. The problem you're describing is extremely similar to one we were dealing with in March after we patched to 2.12.8, right down to those call traces. Went away when we switched. -Laura ___ lustre-discuss mailing list

Re: [lustre-discuss] Corrupted? MDT not mounting

2022-05-06 Thread Andreas Dilger via lustre-discuss
On May 5, 2022, at 07:16, Andrew Elwell via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: I've got a case open with the vendor to see if there are any firmware updates - but I'm not hopeful. These are 6 core single socket broadwells. with 128G of RAM, Storage disks are mounted

Re: [lustre-discuss] Corrupted? MDT not mounting

2022-05-05 Thread Andrew Elwell via lustre-discuss
> It's looking more like something filled up our space - I'm just > copying the files out as a backup (mounted as ldiskfs just now) - Ahem. Inode quotas are a good idea. Turns out that a user creating about 130 million directories rapidly is more than a small MDT volume can take. An update on

Re: [lustre-discuss] Corrupted? MDT not mounting

2022-04-20 Thread Andrew Elwell via lustre-discuss
Thanks Stéphane, It's looking more like something filled up our space - I'm just copying the files out as a backup (mounted as ldiskfs just now) - we're running DNE (MDT and this one MDT0001) but I don't understand why so much space is being taken up in REMOTE_PARENT_DIR - we seem to have

Re: [lustre-discuss] Corrupted? MDT not mounting

2022-04-19 Thread Stephane Thiell via lustre-discuss
Hi Andrew, kernel: LustreError: 13921:0:(genops.c:478:class_register_device()) astrofs-OST-osc-MDT0001: already exists, won't add is symptomatic of a llog index issue/mismatch on the MDT vs. MGT. I would check if the llog backup of MDT0001 (over ldiskfs in CONFIGS) matches the one on the