[lustre-discuss] MDS using D3710 DAS

2021-02-11 Thread Sid Young
G'day all, Is anyone using a HPe D3710 with two HPeDL380/385 servers in a MDS HA Configuration? If so, is your D3710 presenting LV's to both servers at the same time AND are you using PCS with the Lustre PCS Resources? I've just received new kit and cannot get disk to present to the MDS servers a

Re: [lustre-discuss] [EXTERNAL] Elegant way to dump quota/usage database?

2021-02-11 Thread Kevin M. Hildebrand
I had the same problem. You can collect this information on a per-OST basis, and do the summation yourself. If you're collecting other metrics already, it's easy to do. We use telegraf and grafana to harvest metrics. On each OSS, under /proc/fs/lustre/osd-*/*-OST*/quota_slave you'll find several

Re: [lustre-discuss] LNET IB intermittent connection

2021-02-11 Thread Nathan Crawford
Hi Chris and Cory, I remember looking at configuring multi-rail when 2.12 came out for this very reason, but stopped when it looked like round-robin only. Is there a way to trick the LNet Health system into seeing one interface as "sick but not dead"? Also, when is 2.14 coming out :) For w

Re: [lustre-discuss] LNET IB intermittent connection

2021-02-11 Thread Nathan Crawford
Hi Colin, I've done checks of the performance/error counters, and used the in-OS-repo version ibdiagnet. Apart from a couple nodes with known failing cables/HCAs (not involved in lnet connectino probs), the fabric was healthy. It did pick up that the IPoIB partition was still at 20gbit/s from wh

Re: [lustre-discuss] LNET IB intermittent connection

2021-02-11 Thread Horn, Chris
FYI, multi-rail in 2.12 will round robin traffic between both @tcp and @o2ib networks (assuming peers are reachable on both). If @o2ib flakes out then traffic should shift entirely to @tcp, but there isn’t a way to specify that traffic go to @tcp only when there’s a problem with @o2ib. You need

Re: [lustre-discuss] LNET IB intermittent connection

2021-02-11 Thread Spitz, Cory James
Hi, Nate. You asked, “can LNET be easily configured to go over the @tcp connection when the @o2ib flakes out?” Yes, you can use LNet Multi-Rail for it and that _is_ covered in the “fine manual”, chapter 16 ☺ https://doc.lustre.org/lustre_manual.xhtml#lnetmr -Cory On 2/10/21, 4:54 PM, "lustre-

Re: [lustre-discuss] [EXTERNAL] Elegant way to dump quota/usage database?

2021-02-11 Thread Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss
FWIW, I've had the same need and I do exactly the brute force iteration you speak of for our LFS's to log user usage vs time. For our NFS server, we use ZFS and there is a nice one-liner to give that info in ZFS. I agree, it would be nice if there was a similar one-liner for lustre. -O

[lustre-discuss] Elegant way to dump quota/usage database?

2021-02-11 Thread Steve Barnet
Hey all, I would like to be able to dump the usage tracking and quota information for my lustre filesystems. I am currently running lustre 2.12 lfs quota -u $user $filesystem works well enough for a single user. But I have been looking for a way to get that information for all users of the

[lustre-discuss] Disabling max creates and migrating data doesn't seem to be reducing the usage on an OST

2021-02-11 Thread Kurt Strosahl
Good Morning, One of the OSTs in a lustre file system I manage is showing a higher usage. I attempted to stop writes by setting the max_create_count to zero and then moving data off it but that doesn't seem to be working. > lfs df | grep OST:15lustre19-OST000f_UUID 71145018368 62653382656 849