Re: [lustre-discuss] frequent Connection lost, Connection restored to mdt

2019-12-23 Thread David Cohen
Hi,
Yes, I do see load on the client side, but as the client has 40gb NIC and
the load comes from a 10gb WAN link I wouldn't expect it to overload the
net.
I can correlate the messages with load higher than 6gb from the WAN. Far
from the limit of the NIC.
The client has a latest generation Xeon processor so I wouldn't expect that
to be the bottle neck either.

David


On Mon, Dec 23, 2019 at 5:09 PM Degremont, Aurelien 
wrote:

> Hi
>
>
>
> These messages means the client thinks it has lost the communication with
> the server and reconnect. The server only sees the reconnection and never
> thought the client was gone.
>
>
>
> It could be related to lots of things. The server could be receiving RPCs
> from this client but not processing them fast enough. Is there other errors
> on your server? Is there any high load?
>
> Same on your clients? Is there any high load that could prevent your
> client from communicating with your server properly?
>
>
>
> Do you correlate that with some specific load running on your clients?
>
>
>
> Aurélien
>
>
>
> *De : *lustre-discuss  au nom de
> David Cohen 
> *Date : *dimanche 22 décembre 2019 à 17:08
> *À : *"lustre-discuss@lists.lustre.org" 
> *Objet : *[lustre-discuss] frequent Connection lost, Connection restored
> to mdt
>
>
>
> Hi,
>
> We are running 2.10.5 on the servers and 2.10.8 on the clients.
>
> Every few minutes, we see:
>
>
>
> On client side:
>
>
>
> Dec 22 15:26:34 gftp kernel: Lustre:
> 439834:0:(client.c:2116:ptlrpc_expire_one_request()) @@@ Request sent has
> timed out for slow reply: [sent 1577021187/real 1577021187]
>  req@88160be9c6c0 x1653620348981536/t0(0)
> o36->lustre-MDT-mdc-8817d9776c00@10.0.0.1@tcp:12/10 lens 608/4768
> e 0 to 1 dl 1577021194 ref 2 fl Rpc:X/0/ rc 0/-1
> Dec 22 15:26:34 gftp kernel: Lustre:
> 439834:0:(client.c:2116:ptlrpc_expire_one_request()) Skipped 3 previous
> similar messages
> Dec 22 15:26:34 gftp kernel: Lustre: lustre-MDT-mdc-8817d9776c00:
> Connection to lustre-MDT (at 10.0.0.1@tcp) was lost; in progress
> operations using this service will wait for recovery to complete
> Dec 22 15:26:34 gftp kernel: Lustre: Skipped 3 previous similar messages
> Dec 22 15:26:34 gftp kernel: Lustre: lustre-MDT-mdc-8817d9776c00:
> Connection restored to 10.0.0.1@tcp (at 192.114.101.153@tcp)
> Dec 22 15:26:34 gftp kernel: Lustre: Skipped 3 previous similar messages
>
>
>
> On server side:
>
>
>
> Dec 22 15:26:34 oss03 kernel: Lustre: lustre-MDT: Client
> 38d6eef1-e146-be41-bab9-409b272d0d4f (at 10.0.0.10@tcp) reconnecting
> Dec 22 15:26:34 oss03 kernel: Lustre: lustre-MDT: Connection restored
> to ec2cdfce-353f-583a-c970-fde3f5d5189c (at 10.0.0.10@tcp)
>
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] frequent Connection lost, Connection restored to mdt

2019-12-23 Thread Degremont, Aurelien
Hi

These messages means the client thinks it has lost the communication with the 
server and reconnect. The server only sees the reconnection and never thought 
the client was gone.

It could be related to lots of things. The server could be receiving RPCs from 
this client but not processing them fast enough. Is there other errors on your 
server? Is there any high load?
Same on your clients? Is there any high load that could prevent your client 
from communicating with your server properly?

Do you correlate that with some specific load running on your clients?

Aurélien

De : lustre-discuss  au nom de David 
Cohen 
Date : dimanche 22 décembre 2019 à 17:08
À : "lustre-discuss@lists.lustre.org" 
Objet : [lustre-discuss] frequent Connection lost, Connection restored to mdt

Hi,
We are running 2.10.5 on the servers and 2.10.8 on the clients.
Every few minutes, we see:

On client side:

Dec 22 15:26:34 gftp kernel: Lustre: 
439834:0:(client.c:2116:ptlrpc_expire_one_request()) @@@ Request sent has timed 
out for slow reply: [sent 1577021187/real 1577021187]  req@88160be9c6c0 
x1653620348981536/t0(0) 
o36->lustre-MDT-mdc-8817d9776c00@10.0.0.1@tcp:12/10 lens 608/4768 e 0 
to 1 dl 1577021194 ref 2 fl Rpc:X/0/ rc 0/-1
Dec 22 15:26:34 gftp kernel: Lustre: 
439834:0:(client.c:2116:ptlrpc_expire_one_request()) Skipped 3 previous similar 
messages
Dec 22 15:26:34 gftp kernel: Lustre: lustre-MDT-mdc-8817d9776c00: 
Connection to lustre-MDT (at 10.0.0.1@tcp) was lost; in progress operations 
using this service will wait for recovery to complete
Dec 22 15:26:34 gftp kernel: Lustre: Skipped 3 previous similar messages
Dec 22 15:26:34 gftp kernel: Lustre: lustre-MDT-mdc-8817d9776c00: 
Connection restored to 10.0.0.1@tcp (at 192.114.101.153@tcp)
Dec 22 15:26:34 gftp kernel: Lustre: Skipped 3 previous similar messages

On server side:

Dec 22 15:26:34 oss03 kernel: Lustre: lustre-MDT: Client 
38d6eef1-e146-be41-bab9-409b272d0d4f (at 10.0.0.10@tcp) reconnecting
Dec 22 15:26:34 oss03 kernel: Lustre: lustre-MDT: Connection restored to 
ec2cdfce-353f-583a-c970-fde3f5d5189c (at 10.0.0.10@tcp)

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] frequent Connection lost, Connection restored to mdt

2019-12-22 Thread David Cohen
Hi,
We are running 2.10.5 on the servers and 2.10.8 on the clients.
Every few minutes, we see:

On client side:

Dec 22 15:26:34 gftp kernel: Lustre:
439834:0:(client.c:2116:ptlrpc_expire_one_request()) @@@ Request sent has
timed out for slow reply: [sent 1577021187/real 1577021187]
 req@88160be9c6c0 x1653620348981536/t0(0)
o36->lustre-MDT-mdc-8817d9776c00@10.0.0.1@tcp:12/10 lens 608/4768 e
0 to 1 dl 1577021194 ref 2 fl Rpc:X/0/ rc 0/-1
Dec 22 15:26:34 gftp kernel: Lustre:
439834:0:(client.c:2116:ptlrpc_expire_one_request()) Skipped 3 previous
similar messages
Dec 22 15:26:34 gftp kernel: Lustre: lustre-MDT-mdc-8817d9776c00:
Connection to lustre-MDT (at 10.0.0.1@tcp) was lost; in progress
operations using this service will wait for recovery to complete
Dec 22 15:26:34 gftp kernel: Lustre: Skipped 3 previous similar messages
Dec 22 15:26:34 gftp kernel: Lustre: lustre-MDT-mdc-8817d9776c00:
Connection restored to 10.0.0.1@tcp (at 192.114.101.153@tcp)
Dec 22 15:26:34 gftp kernel: Lustre: Skipped 3 previous similar messages

On server side:

Dec 22 15:26:34 oss03 kernel: Lustre: lustre-MDT: Client
38d6eef1-e146-be41-bab9-409b272d0d4f (at 10.0.0.10@tcp) reconnecting
Dec 22 15:26:34 oss03 kernel: Lustre: lustre-MDT: Connection restored
to ec2cdfce-353f-583a-c970-fde3f5d5189c (at 10.0.0.10@tcp)
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org