Hi all,

I wonder if I am seeing signs of network problems when mounting an OST:


tunefs.lustre --dryrun tells me (what I know from my own format command)
>Parameters: mgsnode=10.20.3.0@o2ib5:10.20.3.1@o2ib5

These are the nids for our MGS+MDT0, there are two more pairs for MDT1 and MDT2.

I went step-by-step, modprobing lnet and lustre, and checking LNET by 'lnet ping' to the active MDTs, which worked fine.

However, mounting such an OST (e.g. after a crash) at first prints a number of
> LNet: 19444:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Timed out tx for 
10.20.3.1@o2ib5: 0 seconds

and similarly for the failover partners of the other two MDS.

Should it do that?


Imho, LNET to a failover node _must_ fail, because LNET should not be up on the 
failover node, right?

If I started LNET there, and some client does not get an answer quickly enough from the acting MDS, it would try the failover, LNET yes but Lustre no - that doesn't sound right.


Regards,
Thomas

--
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to