Re: [lustre-discuss] Lnet and IPv6

2016-05-25 Thread Oucharek, Doug S
The situation depicted in those two links has not changed. IPv6 is not supported by LNet due to the reasons given in the second link (i.e. it is a lot of work in multiple places). Doug On May 25, 2016, at 5:54 PM, Frederick Lefebvre

Re: [lustre-discuss] poor performance on reading small files

2016-08-03 Thread Oucharek, Doug S
Also note: If you are using IB, these small reads will make use of RDMA. LNet only uses rdma_writes (historical reasons for this) so the client has to use IB immediate messages to tell the server to write the 20kb file to the client. The extra round-trip handshake involved with this will add

Re: [lustre-discuss] difficulties mounting client via an lnet router

2016-07-11 Thread Oucharek, Doug S
You mentioned that the servers are on the o2ib0 network, but the error messages indicate that the client is trying to communicate with the MDT on the tcp network. The file system configuration needs to be updated to use the updated NIDs. Doug > On Jul 11, 2016, at 7:34 AM, Jessica Otey

Re: [lustre-discuss] LNET Self-test

2017-02-07 Thread Oucharek, Doug S
cted bandwidth in the [W]-position, whereas "brw > write" reports it in the [R]-position. > > This is on CentOS-6.5/Lustre-2.5.3. Will try 7.3/2.9.0 later. > > Thanks, > /jon > > > On 02/06/2017 05:45 PM, Oucharek, Doug S wrote: >> Try running just a read

Re: [lustre-discuss] LNET Self-test

2017-02-05 Thread Oucharek, Doug S
Yes, you can bump your concurrency. Size caps out at 1M because that is how LNet is setup to work. Going over 1M size would result in an unrealistic Lustre test. Doug > On Feb 5, 2017, at 11:55 AM, Jeff Johnson > wrote: > > Without seeing your entire

Re: [lustre-discuss] LNET Self-test

2017-02-06 Thread Oucharek, Doug S
er RPC sizes available. Is there some reason that's not true? - Patrick From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org<mailto:lustre-discuss-boun...@lists.lustre.org>> on behalf of Oucharek, Doug S <doug.s.oucha...@intel.com<ma

Re: [lustre-discuss] LNET Self-test

2017-02-06 Thread Oucharek, Doug S
Try running just a read test and then just a write test rather than having both at the same time and see if the performance goes up. Doug > On Feb 6, 2017, at 4:40 AM, Jon Tegner wrote: > > Hi, > > I used the following script: > > #!/bin/bash > export LST_SESSION=$$ > lst

Re: [lustre-discuss] RDMA too fragmented, OSTs unavailable (permanently)

2016-09-22 Thread Oucharek, Doug S
Hi Thomas, It is interesting that you have encountered this error without a router. Good information. I have updated LU-5718 with a link to this discussion. The original fix posted to LU-5718 by Liang will fix his problem for you (it does not assume a router is the cause). That fix does

Re: [lustre-discuss] RDMA too many fragments/timed out - clients slowing entire filesystem performance

2016-11-01 Thread Oucharek, Doug S
Hi Brian, You need this patch: http://review.whamcloud.com/#/c/12451. It has not landed to master yet and is off by default. To activate it, add this module parameter line to your nodes (all of them): options ko2iblnd wrq_sge=2 The issue is that something is causing an offset to be

Re: [lustre-discuss] building lustre from source rpms, mellanox OFED, CentOS 6.8

2016-12-16 Thread Oucharek, Doug S
What distro do you want to build for? If RHEL 7.3, the instructions Brett quoted no longer work thanks to weak module loading being activated. Doug On Dec 16, 2016, at 9:43 AM, Brett Lee > wrote: Hi Lana, Here's a link:

Re: [lustre-discuss] client fails to mount

2017-04-25 Thread Oucharek, Doug S
That specific message happens when the “magic” u32 field at the start of a message does not match what we are expecting. We do check if the message was transmitted as a different endian from us so when you see this error, we assume that message has been corrupted or the sender is using an

Re: [lustre-discuss] Does Lustre support RoCE?

2017-05-12 Thread Oucharek, Doug S
t;> wrote: Thanks for the advice. I had a hunch that the development will take time. Regards, Indivar Nair On Thu, May 11, 2017 at 11:28 PM, Oucharek, Doug S <doug.s.oucha...@intel.com<mailto:doug.s.oucha...@intel.com>> wrote: As I write this, I am banging my head against this wall

Re: [lustre-discuss] Does Lustre support RoCE?

2017-05-11 Thread Oucharek, Doug S
The note regarding MOFED 4 not supported by Lustre: I’m working on it. MOFED 4 did not drop support of Lustre, but did make API/behaviour changes which Lustre has not fully adapted to yet. The ball is in the Lustre community’s court on this one now. Doug On May 11, 2017, at 8:47 AM, Simon

Re: [lustre-discuss] Does Lustre support RoCE?

2017-05-11 Thread Oucharek, Doug S
Thanks a lot, Michael, Andreas, Simon, Doug, I have already installed MLNX OFED 4:-( I will now have to undo it and install the earlier version. Roughly, by when would the support for MLNX OFED 4 be available? Regards, Indivar Nair On Thu, May 11, 2017 at 9:35 PM, Oucharek, Doug S <doug.s.ouc

Re: [lustre-discuss] Robinhood exhausting RPC resources against 2.5.5 lustre file systems

2017-05-17 Thread Oucharek, Doug S
How is it you are getting the same NID registering twice in the log file: Feb 24 20:46:44 10.7.7.8 kernel: LNet: Added LNI 10.7.17.8@o2ib [8/256/0/180] Feb 24 20:46:44 10.7.7.8 kernel: LNet: Added LNI 10.7.17.8@o2ib [8/256/0/180] Doug On May 17, 2017, at 11:04 AM, Jessica Otey

Re: [lustre-discuss] Clients looses IB connection to OSS.

2017-05-01 Thread Oucharek, Doug S
For the “RDMA has too many fragments” issue, you need newly landed patch: http://review.whamcloud.com/12451. For the slow access, not sure if that is related to the too many fragments error. Once you get the too many fragments error, that node usually needs to unload/reload the LNet module to

Re: [lustre-discuss] slow mount of lustre CentOS6 clients to 2.9 servers

2017-05-05 Thread Oucharek, Doug S
Are the NIDs "192.168.xxx.yyy@o2ib” really configured that way or did you modify those logs when pasting them to email? Doug On May 5, 2017, at 11:02 AM, Grigory Shamov > wrote: Hi All, We were installing a new Lustre storage.

Re: [lustre-discuss] Lustre on Mellonax multi-host infiniband problem

2017-05-05 Thread Oucharek, Doug S
I’m not sure I understand what version of MOFED you are using. Can you verify whether this is MOFED 3.x or 4.x. Doug On May 5, 2017, at 9:51 AM, HM Li > wrote: Conformed. This is a bug of git(2.9.55_45), it works well when using

Re: [lustre-discuss] Lustre on Mellonax multi-host infiniband problem

2017-05-05 Thread Oucharek, Doug S
The tag you checked out is missing this fix: https://review.whamcloud.com/#/c/24306/. Try applying that. Doug On May 5, 2017, at 9:51 AM, HM Li > wrote: Conformed. This is a bug of git(2.9.55_45), it works well when using

Re: [lustre-discuss] Lustre on Mellonax multi-host infiniband problem

2017-05-08 Thread Oucharek, Doug S
-4.0-1.0.1.0-rhel7.3-x86_64. This driver and lustre(git, 2.9.55_45) can work well on other normal FDR nodes. On 2017年05月06日 01:14, Oucharek, Doug S wrote: The tag you checked out is missing this fix: https://review.whamcloud.com/#/c/24306/. Try applying that. Doug On May 5, 2017, at 9:51 A