Looks like I can kind of make it happen on one system mow.
Stopping some programs (not pattern in which ones) makes it work, and starting 
some back up again makes it fail.

Tipc nametable has 231 entries when failing and 183 entries when succeeding 
(however on a different system the nametable has 251 entries and it is not 
failing).

How do I look for memory used by TIPC in the kernel?

-----Original Message-----
From: Rune Torgersen <ru...@innovsys.com> 
Sent: Thursday, October 17, 2019 14:53


I will have to look for leaks next time I can make it happen.
I was trying stuff and shut down a different program that was unrelated (but 
had some TIPC sockets open on a different address (104)), and as soon as I did, 
the sends started working again.

It is possible that one of those unrelated sockets has something stuck (as one 
of them was only ever used to send RDM messages but nothing ever reads it).

Any suggestions as to what to start looking at (netstat, tipc, tipc_config or 
kernel params) to try to track it down?.

Problem with testing a patch (or using Unbuntu 18 LTS) is that we cannot 
reliably make it happen.

-----Original Message-----
From: Jon Maloy <jon.ma...@ericsson.com>
Sent: Thursday, October 17, 2019 14:35


Hi Rune,

Do you see any signs of general memory leak ("free") on your node?

Anyway there can be no doubt that this happens because the big buffer pool is 
running empty.

We fixed that in commit 4c94cc2d3d57 ("tipc: fall back to smaller MTU if 
allocation of local send skb fails") which was delivered to Linux 4.16.

Do you have any opportunity to apply that patch and try it?

BR
///jon

> -----Original Message-----
> From: Rune Torgersen <ru...@innovsys.com>
> Sent: 17-Oct-19 12:38
> To: 'tipc-discussion@lists.sourceforge.net' <tipc-
> discuss...@lists.sourceforge.net>
> Subject: [tipc-discussion] Error allocating memeory error when sending RDM
> message
>
> Hi.
>
> I am running into an issue when sending SOCK_RDM or SOCK_DGRAM
> messages. On a system that has been up for a time (120+ days inthis case), I
> cannot send any RDM/DGRAM type TIPC messages that are larger than about
> 16000 bytes (16033+ fails, 15100 and smaller still works).
> Any larger messages fails with erro code 12 :"Cannot allocate memory".
>
> Really odd thing about it  only happens on some connections and not others,
> on the same system (example, sending to tipc node 103:1003 gets no error,
> while sending to 103:3 get error).
> When it gets into this state, it seems to happen forever on the same
> destination address, and not on others until system is rebooted. (restarting 
> the
> server side application makes no difference).
> The sends are done on the same node as the receiver is on.
>
> Kernel is Ubuntu 16.04 LTS 4.4.0-150 in this case, also seen on 161.
>
> Nametable for 103:
> 103        2          2          <1.1.1:2328193343>         2328193344  
> cluster
> 103        3          3          <1.1.2:3153441800>         3153441801  
> cluster
> 103        5          5          <1.1.4:269294867>          269294868   
> cluster
> 103        1002       1002       <1.1.1:490133365>          490133366   
> cluster
> 103        1003       1003       <1.1.2:2552019732>         2552019733  
> cluster
> 103        1005       1005       <1.1.4:625110186>          625110187   
> cluster
>
> _______________________________________________
> tipc-discussion mailing list
> tipc-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion


_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to