Re: [lustre-discuss] Lustre routing help needed

2017-11-01 Thread Kevin M. Hildebrand
So apparently the issue is indeed with the combination of using a Lustre 2.10.1 router with 2.8 servers and clients. Downgrading the router to 2.9 seems to have solved the problem. (I can't run 2.8 on the router, because I'm running MOFED 4.1 for the Mellanox ConnectX-5, and I can't get 2.8 to

Re: [lustre-discuss] Lustre routing help needed

2017-10-30 Thread Dilger, Andreas
The 2.10 release added support for multi-rail LNet, which may potentially be causing problems here. I would suggest to install an older LNet version on your routers to match your client/server. You may need to build your own RPMs for your new kernel, but can use --disable-server for configure

Re: [lustre-discuss] Lustre routing help needed

2017-10-30 Thread Kevin M. Hildebrand
Thanks, I completely missed that. Indeed the ko2iblnd parameters were different between the servers and the router. I've updated the parameters on the router to match those on the server, and things haven't gotten any better. (The problem appears to be on the Ethernet side anyway, so you've

Re: [lustre-discuss] Lustre routing help needed

2017-10-30 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 30, 2017, at 8:47 AM, Kevin M. Hildebrand wrote: > > All of the hosts (client, server, router) have the following in ko2iblnd.conf: > > alias ko2iblnd-opa ko2iblnd > options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 > concurrent_sends=256 ntx=2048

Re: [lustre-discuss] Lustre routing help needed

2017-10-30 Thread Kevin M. Hildebrand
I received a reply from Alejandro suggesting that I check live_router_check_interval, dead_router_check_interval and router_ping_timeout. I had those set to the defaults, which I assume are 60, 60, and 50 seconds respectively. I did just try setting those values explicitly, and I'm not seeing any

Re: [lustre-discuss] Lustre routing help needed

2017-10-30 Thread LOPEZ, ALEXANDRE
: Monday, October 30, 2017 1:47 PM To: lustre-discuss@lists.lustre.org Subject: [lustre-discuss] Lustre routing help needed Hello, I'm trying to set up some new Lustre routers between a set of Infiniband connected Lustre servers and a few hosts connected to an external 100G Ethernet network

[lustre-discuss] Lustre routing help needed

2017-10-30 Thread Kevin M. Hildebrand
Hello, I'm trying to set up some new Lustre routers between a set of Infiniband connected Lustre servers and a few hosts connected to an external 100G Ethernet network. The problem I'm having is that the routers work just fine for a minute or two, and then shortly thereafter they're marked as