[lustre-discuss] On the origins of lnet stat's "drop_count"

2022-12-13 Thread Christian Kuntz
Hello all, As is tradition, resident "off the beaten path" guy, Christian here! I've been trying to track down some odd eviction behavior and whilst conducting a network survey noticed an odd development: a steadily increasing number of drops reported by lnet stat's "drop_count" statistic

[lustre-discuss] Clients disconnect when there are 2+ growing file streams

2022-09-16 Thread Christian Kuntz
Howdy all! Long and short is; when multiple clients write long (100+ GB) contiguous of streams to a file (different for each client) over an SMB export of the lustre client mount, all but one stream locks up and the clients writing those streams lose connectoins to the MDS/OSS momentarily.

[lustre-discuss] ZFS MDT Corruption

2022-09-16 Thread Christian Kuntz
Oof! That's not a good situation to be in. Unfortunately, I've hit the dual import situation before as well, and as far as I know once you have two nodes import a pool at the same time you're more or less hosed. When it happened to me, I tried using zdb to read all the recent TXGs to try to back

[lustre-discuss] Changing default recovery window time settings

2022-08-04 Thread Christian Kuntz
as possible (within reason, of course). Alternatively, any way to manually end the recovery window would be appreciated. Cheers, and thanks for your attention, Christian Kuntz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http

Re: [lustre-discuss] Duplicate file prompt when transferring many files to lustre via Windows/Mac over Samba

2022-04-05 Thread Christian Kuntz
transfer with "calculating" in the transfer size bar. When it's > done "calculating" I get an additional prompt about " > exist in the destination" that lets me overwrite/skip. Sure enough, the > destination folder has been created with 817 0-byte files, all with the > approp

[lustre-discuss] Duplicate file prompt when transferring many files to lustre via Windows/Mac over Samba

2022-04-04 Thread Christian Kuntz
he appropriate names. The lowest number of 3K files I've see it happen with thus far is 112. Cheers, Christian Kuntz -- * <https://opendrives.ac-page.com/nabshow2022?utm_source=signature_medium=email_campaign=nab022_content=bookameeting> * *Need a free

Re: [lustre-discuss] Lustre version 2.14 support for CentOS 7

2021-05-06 Thread Christian Kuntz
Hi Hugo, The autoconf has some detection that should be able to grab the SPL information for Zfs .8+ source dirs, so you may be able to scrub it out and let the scripting handle it (you can always double check it's correct by reading the conf logs). Can you forward any configuration errors you

[lustre-discuss] o2ib nid connections timeout until an snmp ping

2021-03-09 Thread Christian Kuntz via lustre-discuss
Hello all, Requisite preamble: This is debian 10.7 with lustre 2.13.0 (compiled by yours truly). We've been observing some odd behavior recently with o2ib NIDs. Everyone's all connected over the same switch (cards and switch are all mellanox), each machine has a single network card connected in

[lustre-discuss] Compiling 2.14 + ZFS 2.0.2 on Debian

2021-02-17 Thread Christian Kuntz
Hello, I hope I'm communicating in the right place here, I'm currently working to compile Lustre 2.14.0-RC2 on Debian 10.7 with ZFS 2.0.2 for OSDs. If there's anything I can do to help with the testing effort or help Lustre's Debian support be more robust, please let me know! I hope I'm not too

[lustre-discuss] Failovermode=failout no longer supported?

2020-05-26 Thread Christian Kuntz
Hello all, I've been trying to test the fallout failover mode, but instead of getting the "connection lost, in progress operations using this service will fail" message and a failure, I receive the "in progress operations will wait for recovery to complete" and the operation hangs forever. I'm

[lustre-discuss] OST with failover.mode=failout not failing out

2020-04-28 Thread Christian Kuntz
Hello all, I'm currently running 2.13.0 on Debian Buster with ZFS osds. My current setup is a simple cluster with all the components on the same node. Though the OST is marked as "failout", operations are still hanging indefinitely when they should fail after a timeout. Predictably, I get the

[lustre-discuss] Poor read performance when using o2ib nets over RoCE

2020-02-11 Thread Christian Kuntz
Hello, I've been running into a strange issue where my writes are blazingly fast (5.5 GB/s) over RoCE with Mellanox MCX516A-CCAT cards all running together over o2ib, but read performance tanks to roughly 100 MB/s. During mixed read/write situations write performance also plummets to sub 100MB/s.