Re: [lustre-discuss] RDMA too fragmented, OSTs unavailable (permanently)

2016-09-22 Thread Oucharek, Doug S
Hi Thomas,

It is interesting that you have encountered this error without a router.  Good 
information.   I have updated LU-5718 with a link to this discussion.

The original fix posted to LU-5718 by Liang will fix his problem for you (it 
does not assume a router is the cause).  That fix does double the amount of 
memory used per QP.  Probably not an issue for a client, but could be an issue 
for a router (as Cray has found).

Are you using the quotas feature?  There is some evidence that may play a role 
here.

Doug

> On Sep 10, 2016, at 12:38 AM, Thomas Roth  wrote:
> 
> Hi all,
> 
> we are running Lustre 2.5.3 on Infiniband. We have massive problems with 
> clients being unable to communicate with any number of OSTs, rendering the 
> entire cluster quite unusable.
> 
> Clients show
> > LNetError: 1399:0:(o2iblnd_cb.c:1140:kiblnd_init_rdma()) RDMA too 
> > fragmented for 10.20.0.242@o2ib1 (256): 231/256 src 231/256 dst frags
> > LNetError: 1399:0:(o2iblnd_cb.c:1690:kiblnd_reply()) Can't setup rdma for 
> > GET from 10.20.0.242@o2ib1: -90
> 
> which eventually results in OSTs at that nid becoming "temporarily 
> unavailable".
> However, the OSTs are never recovered, until they are manually evicted or the 
> host rebooted.
> 
> On the OSS side, this reads
> >  LNetError: 13660:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA 
> > with 10.20.0.220@o2ib1 (56): c: 7, oc: 0, rc: 7
> 
> 
> We have checked the IB fabric, which shows no errors. Since we are not able 
> to reproduce this effect in a simple way, we have also scrutinized the user 
> code, so far without results.
> 
> Whenever this happens, the connection between client and OSS is fine under 
> all IB test commands.
> Communication between client and OSS is still going on, but obviously when 
> Lustre tries to replay the missed transaction, this fragmentation limit is 
> hit again, so the OST never becomes available again.
> 
> If we understand correctly, the map_on_demand parameter should be increased 
> as a workaround.
> The ko2iblnd module seems to provide this parameter,
> > modinfo ko2iblnd
> > parm:   map_on_demand:map on demand (int)
> 
> but no matter what we load the module with, map_on_demand always remains at 
> the default value,
> > cat /sys/module/ko2iblnd/parameters/map_on_demand
> > 0
> 
> Is there any way to understand
> - why this memory fragmentation occurs/becomes so large?
> - how to measure the real fragmentation degree (o2iblnd simply stops at 256, 
> perhaps we are at 1000?)
> - why map_on_demand cannot be changed?
> 
> 
> Of course this all looks very much like LU-5718, but our clients are not 
> behind LNET routers.
> 
> There is one router which connects to the campus network but is not in use. 
> And there are some routers which connect to an older cluster, but of course 
> the old (1.8) clients never show any of these errors.
> 
> 
> Cheers,
> Thomas
> 
> 
> Thomas Roth
> Department: HPC
> Location: SB3 1.262
> Phone: +49-6159-71 1453  Fax: +49-6159-71 2986
> 
> GSI Helmholtzzentrum für Schwerionenforschung GmbH
> Planckstraße 1
> 64291 Darmstadt
> www.gsi.de
> 
> Gesellschaft mit beschränkter Haftung
> Sitz der Gesellschaft: Darmstadt
> Handelsregister: Amtsgericht Darmstadt, HRB 1528
> 
> Geschäftsführung: Professor Dr. Karlheinz Langanke
> Ursula Weyrich
> Jörg Blaurock
> 
> Vorsitzender des Aufsichtsrates: St Dr. Georg Schütte
> Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] OpenSFS Transition and Futures

2016-09-22 Thread ssimms

Please accept my sincere apologies if this reaches you more than once.

Dear Members of the Lustre Community,

I write to you now with passion and enthusiasm about the restructure and 
transformation of OpenSFS into a user-driven organization dedicated to 
addressing the current and future needs of Lustre users.


It is my sincere pleasure to announce that on Thursday last week, after months 
of discussion and careful consideration, the OpenSFS Board transferred the 
organization to a new temporary board of users representing academia, business, 
and the national laboratories.  In addition to myself, the temporary board 
consists of


Shawn Hall, BP
Steve Monk, Sandia National Laboratory
Sarp Oral, Oak Ridge National Laboratory
Rick Wagner, Globus (formerly San Diego Supercomputing)

This board will remain in place until an election can be held at this year's 
Lustre User Group meeting (a 'save the date' message will be coming soon).


OpenSFS has accomplished many great things since its inception, providing 
leadership, manpower, and capital that have improved Lustre and ensured its 
place in the HPC ecosystem.  Now the time has come for those who rely most on 
Lustre, its users, to guide OpenSFS into the future to provide:


   - elected leadership
   - a unified voice
   - a user run Lustre User Group meeting
   - support for the Lustre Working Group
   - support for lustre.org along with EOFS
   - chances for frank and direct contact between vendors and users

To encourage participation from users and vendors alike, the membership model 
has been flattened to two categories and dues reduced significantly:


Members (user organizations) - $1,000 annual dues
- voting rights
- eligibility to serve on the board
- eligibility to serve on LUG planning committee
- eligibility to participate in requirements gathering

Participants (vendor organizations) - $5,000 annual dues
- support community efforts to promote Lustre
- opportunities for direct contact with User community
- access to community requirements gathering exercise
- eligible to attend OpenSFS member meetings

These changes are a positive step forward for OpenSFS and our community and we 
would love to have your involvement to help ensure Lustre remains open and to 
help shape Lustre's moving forward.


If you have questions about the organizational changes, would like to 
volunteer, or discuss future objectives, feel free to reach out to me, any of 
the temporary board members, or send mail to ad...@opensfs.org. In the 
meantime, we will be moving forward with new streamlined bylaws available here:


http://cdn.opensfs.org/wp-content/uploads/2016/09/Open-SFS-Amendment-and-Restated-Bylaws_Final_091516.pdf

In closing, I want to thank Mark Seager from Intel for his crucial role in 
founding OpenSFS in 2010 before his departure from Lawrence Livermore National 
Laboratory, Charlie Carroll from Cray for his effort and leadership as chairman 
of the board, and all former board members and their organizations for making 
this transition possible.


It has been an honor serving as the community board representative and I look 
forward to continued service as a member of the temporary OpenSFS board.


Sincerely,
Stephen Simms
OpenSFS Temporary Board Member

Manager, High Performance File Systems
Indiana University
ssi...@iu.edu
812-855-7211
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Mount lustre client with MDS/MGS backup

2016-09-22 Thread Pardo Diaz, Alfonso
Machines are running lustre version 2.8.0 (clients and servers)


> El 20 sept 2016, a las 17:32, Mohr Jr, Richard Frank (Rick Mohr) 
>  escribió:
> 
> 
>> On Sep 19, 2016, at 2:40 AM, Pardo Diaz, Alfonso  
>> wrote:
>> 
>> I still having the same problem in my system. My clients is stucked in the 
>> primary MDS, that it's down, and It doesn’t use the backup (service MDS), 
>> but only when try to connect there first time.
>> As I said in previous messages, the client connected when the primary was ok 
>> it can use the service MDS without problems.
>> 
>> Any suggestion?
> 
> Unfortunately, no.  Did you ever mention which Lustre version you are 
> running?  I don’t recall seeing that.
> 
> --
> Rick Mohr
> Senior HPC System Administrator
> National Institute for Computational Sciences
> http://www.nics.tennessee.edu
> 

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org