[lustre-discuss] Hardware advice for homelab

2021-07-19 Thread Andrew Elwell via lustre-discuss
Hi folks, Given my homelab testing for Lustre tends to be contained within VirtualBox on laptop ($work has a physical hardware test bed once mucking around gets serious), I'm considering expanding to some real hardware at home for testing. My MythTV days are over, but I'd ideally like an aarch64

[lustre-discuss] Corrupted? MDT not mounting

2022-04-19 Thread Andrew Elwell via lustre-discuss
Hi Folks, One of our filesystems seemed to fail over the holiday weekend - we're running DNE and MDT0001 won't mount. At first it looked like we'd run out of space (rc = -28) but then we were seeing this mount.lustre: mount /dev/mapper/MDT0001 at /lustre/astrofs-MDT0001 failed: File exists

Re: [lustre-discuss] Corrupted? MDT not mounting

2022-04-20 Thread Andrew Elwell via lustre-discuss
Thanks Stéphane, It's looking more like something filled up our space - I'm just copying the files out as a backup (mounted as ldiskfs just now) - we're running DNE (MDT and this one MDT0001) but I don't understand why so much space is being taken up in REMOTE_PARENT_DIR - we seem to have

[lustre-discuss] jobstats

2022-05-27 Thread Andrew Elwell via lustre-discuss
Hi folks, I've finally started to re-investigate pushing jobstats to our central dashboards and realised there's a dearth of scripts / tooling to actually gather the job_stats files and push them to $whatever. I have seen the telegraf one, and the DDN fork of collectd seems somewhat abandonware.

[lustre-discuss] 2.12.9-ib release?

2022-06-24 Thread Andrew Elwell via lustre-discuss
Hi folks, I see the 2.12.9/ release tree on https://downloads.whamcloud.com/public/lustre/, but I don't see the accompanying 2.12.9-ib/ one. ISTR someone needed to poke a build process last time to get this public - can they do the same this time please? Many thanks Andrew

[lustre-discuss] unclear language in Operations manual

2022-06-15 Thread Andrew Elwell via lustre-discuss
Hi folks, I've recently come across this snippet in the ops manual (section 13.8. Running Multiple Lustre File Systems, page 111 in the current pdf) > Note > If a client(s) will be mounted on several file systems, add the following > line to /etc/ xattr.conf file to avoid problems when files

Re: [lustre-discuss] Corrupted? MDT not mounting

2022-05-10 Thread Andrew Elwell via lustre-discuss
On Wed, 11 May 2022 at 04:37, Laura Hild wrote: > The non-dummy SRP module is in the kmod-srp package, which isn't included in > the Lustre repository... Thanks Laura, Yeah, I realised that earlier in the week, and have rebuilt the srp module from source via mlnxofedinstall, and sure enough

Re: [lustre-discuss] Corrupted? MDT not mounting

2022-05-08 Thread Andrew Elwell via lustre-discuss
On Fri, 6 May 2022 at 20:04, Andreas Dilger wrote: > MOFED is usually preferred over in-kernel OFED, it is just tested and fixed a > lot more. Fair enough, However is the 2.12.8-ib tree built with all the features? specifically

Re: [lustre-discuss] Corrupted? MDT not mounting

2022-05-05 Thread Andrew Elwell via lustre-discuss
> It's looking more like something filled up our space - I'm just > copying the files out as a backup (mounted as ldiskfs just now) - Ahem. Inode quotas are a good idea. Turns out that a user creating about 130 million directories rapidly is more than a small MDT volume can take. An update on

[lustre-discuss] 2.15.x with ConnectX-3 cards

2022-12-10 Thread Andrew Elwell via lustre-discuss
Hi Gang, I've just gone and reimaged a test system in prep for doing an upgrade to Rocky 8 + 2.15.1 (What's the bets 2.15.2 comes out the night I push to prod?) However, the 2.15.1-ib release uses mofed 5.6 ... which no longer supports CX-3 cards. (yeah, it's olde hardware...) Having been badly

[lustre-discuss] BCP for High Availability?

2023-01-15 Thread Andrew Elwell via lustre-discuss
Hi Folks, I'm just rebuilding my testbed and have got to the "sort out all the pacemaker stuff" part. What's the best current practice for the current LTS (2.15.x) release tree? I've always done this as multiple individual HA clusters covering each pair of servers with common dual connected

[lustre-discuss] Version interoperability

2022-11-08 Thread Andrew Elwell via lustre-discuss
Hi folks, We're faced with a (short term measured in months, not years thankfully) seriously large gap in versions between our existing clients (2.7.5) and new hardware clients (2.15.0) that will be mounting the same file system. It's currently on 2.10.8-ib (ldiskfs) with connectx-5 cards, and I