Re: [lustre-discuss] Lustre 2.12.0 and locking problems

2019-03-06 Thread Amir Shehata
no problem On Wed, 6 Mar 2019 at 12:15, Riccardo Veraldi wrote: > On 3/6/19 11:29 AM, Amir Shehata wrote: > > The reason for the load being split across the tcp and o2ib0 for the 2.12 > client, is because the MR code sees both interfaces and realizes it can use > both of them and so it does. >

Re: [lustre-discuss] Lustre 2.12.0 and locking problems

2019-03-06 Thread Amir Shehata
The reason for the load being split across the tcp and o2ib0 for the 2.12 client, is because the MR code sees both interfaces and realizes it can use both of them and so it does. To disable this behavior you can disable discovery on the 2.12 client. I think that should just get the client to only

Re: [lustre-discuss] Lustre 2.12.0 and locking problems

2019-03-06 Thread Riccardo Veraldi
Hello Amir i answer in-line On 3/5/19 3:42 PM, Amir Shehata wrote: It looks like the ping is passing. Did you try it several times to make sure it always pings successfully? The way it works is the MDS (2.12) discovers all the interfaces on the peer. There is a concept of the primary NID for

Re: [lustre-discuss] Lustre 2.12.0 and locking problems

2019-03-05 Thread Riccardo Veraldi
it is not exactly this problem. here is my setup * MDS is on tcp0 * client is on tcp0 and o2ib0 * OSS is on tcp0 and o2ib0 The problem is that the MDS is discovering both the lustre client and the OSS as well over o2ib and it should not because the MDS has only one ethernet interface. I

Re: [lustre-discuss] Lustre 2.12.0 and locking problems

2019-03-05 Thread Amir Shehata
Take a look at this: https://jira.whamcloud.com/browse/LU-11840 Let me know if this is the same issue you're seeing. On Tue, 5 Mar 2019 at 14:04, Amir Shehata wrote: > Hi Riccardo, > > It's not LNet Health. It's Dynamic Discovery. What's happening is that > 2.12 is discovering all the

Re: [lustre-discuss] Lustre 2.12.0 and locking problems

2019-03-05 Thread Amir Shehata
Hi Riccardo, It's not LNet Health. It's Dynamic Discovery. What's happening is that 2.12 is discovering all the interfaces on the peer. That's why you see all the interfaces in the peer show. Multi-Rail doesn't enable o2ib. It just sees it. If the node doing the discovery has only tcp, then it

Re: [lustre-discuss] Lustre 2.12.0 and locking problems

2019-03-05 Thread Riccardo Veraldi
I think I figured out the problem. My problem is related to Lnet Network Health feature: https://jira.whamcloud.com/browse/LU-9120 the lustre MDS and the lsutre client having same version 2.12.0 negotiate a Multi-rail peer connection while this does not happen with the other clients (2.10.5).

Re: [lustre-discuss] Lustre 2.12.0 and locking problems

2019-03-05 Thread Patrick Farrell
Riccardo, Since 2.12 is still a relatively new maintenance release, it would be helpful if you could open an LU and provide more detail there - Such as what clients were doing, if you were using any new features (like DoM or FLR), and full dmesg from the clients and servers involved in these