Hi all,
I wonder if I am seeing signs of network problems when mounting an OST:
tunefs.lustre --dryrun tells me (what I know from my own format command)
>Parameters: mgsnode=10.20.3.0@o2ib5:10.20.3.1@o2ib5
These are the nids for our MGS+MDT0, there are two more pairs for MDT1 and MDT2.
I went step-by-step, modprobing lnet and lustre, and checking LNET by 'lnet ping' to the active MDTs,
which worked fine.
However, mounting such an OST (e.g. after a crash) at first prints a number of
> LNet: 19444:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Timed out tx for
10.20.3.1@o2ib5: 0 seconds
and similarly for the failover partners of the other two MDS.
Should it do that?
Imho, LNET to a failover node _must_ fail, because LNET should not be up on the
failover node, right?
If I started LNET there, and some client does not get an answer quickly enough from the acting MDS, it
would try the failover, LNET yes but Lustre no - that doesn't sound right.
Regards,
Thomas
--
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de
Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org