I can't comment on the LNet peer discovery part, but I would definitely not
tecommend to leave the lnet_transaction_timeout that low for normal usage. This
can cause messages to be dropped while the server is processing them and
introduce failures needlessly.
Cheers, Andreas
> On Oct 26, 2023, at 09:48, Bertschinger, Thomas Andrew Hjorth via
> lustre-discuss wrote:
>
> Hello,
>
> Recently we had an OSS node down for an extended period with hardware
> problems. While the node was down, mounting lustre on a client took an
> extremely long time to complete (20-30 minutes). Once the fs is mounted, all
> operations are normal and there isn't any noticeable impact from the absent
> node.
>
> While the client is mounting, the client's debug log shows entries like this
> slowly going by:
>
> 0020:0080:87.0:1698333195.993098:0:3801046:0:(obd_config.c:1384:class_process_config())
> processing cmd: cf005
> 0020:0080:87.0:1698333195.993099:0:3801046:0:(obd_config.c:1396:class_process_config())
> adding mapping from uuid 10.1.2.3@o2ib to nid 0x50abcd123 (10.1.2.4@o2ib)
>
> and there is a "llog_process_th" kernel thread hanging in
> lnet_discover_peer_locked().
>
> We have peer discovery enabled on our clients, but disabling peer discovery
> on a client causes the mount to complete quickly. Also, once the down OSS was
> fixed and powered back on, mounting completed normally again.
>
> We also found that reducing the following timeout sped up the mount by a
> factor of ~10:
>
> $ lnetctl set transaction_timeout 5# was 50 originally
>
> Is such a dramatic slowdown normal in this situation? Is there any fix (aside
> from disabling peer discovery or tuning down the timeout) that could speed up
> mounts in case we have another OSS down in the future?
>
> Lustre version (server and client): 2.15.3
>
> Thanks,
> Thomas Bertschinger
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org