And looking at Ubuntu's git repo for xenial, that patch was never backported.
-----Original Message----- From: Rune Torgersen <ru...@innovsys.com> Sent: Friday, October 18, 2019 08:28 cat /proc/buddyinfo Node 0, zone DMA 2 2 1 1 3 0 1 0 1 1 3 Node 0, zone DMA32 9275 11572 137 6 0 0 0 0 0 0 0 Node 0, zone Normal 35213 15049 476 11 1 0 1 1 1 0 0 Node 1, zone Normal 5917 25209 490 8 6 3 1 1 0 0 0 And I'm aware of the checkin, as I reported it. I was under the impression that that was backported to the tipc drive in the Ubuntu 16.04 LTS 4.4.0 branch (around 4.4.0-110 I think). Either the fix was never incorporated in the 4.4.0 branch, or was reverted recently. -----Original Message----- From: Partha <parthasarathy.bhuvara...@gmail.com> Sent: Friday, October 18, 2019 08:17 Hi Rune, Your systems memory seems to be fragmented, and you need to perform forced reclaim. Can you check the buddy for higher order allocations? cat /proc/buddyinfo BTW, I fixed this in: 57d5f64d83ab tipc: allocate user memory with GFP_KERNEL flag And it was Reported-by: Rune Torgersen <ru...@innovsys.com> Its in upstream v4.10-rc3-167-g57d5f64d83ab regards Partha On 2019-10-17 22:08, Rune Torgersen wrote: > Looks like I can kind of make it happen on one system mow. > Stopping some programs (not pattern in which ones) makes it work, and > starting some back up again makes it fail. > > Tipc nametable has 231 entries when failing and 183 entries when succeeding > (however on a different system the nametable has 251 entries and it is not > failing). > > How do I look for memory used by TIPC in the kernel? > > -----Original Message----- > From: Rune Torgersen <ru...@innovsys.com> > Sent: Thursday, October 17, 2019 14:53 > > > I will have to look for leaks next time I can make it happen. > I was trying stuff and shut down a different program that was unrelated (but > had some TIPC sockets open on a different address (104)), and as soon as I > did, the sends started working again. > > It is possible that one of those unrelated sockets has something stuck (as > one of them was only ever used to send RDM messages but nothing ever reads > it). > > Any suggestions as to what to start looking at (netstat, tipc, tipc_config or > kernel params) to try to track it down?. > > Problem with testing a patch (or using Unbuntu 18 LTS) is that we cannot > reliably make it happen. > > -----Original Message----- > From: Jon Maloy <jon.ma...@ericsson.com> > Sent: Thursday, October 17, 2019 14:35 > > > Hi Rune, > > Do you see any signs of general memory leak ("free") on your node? > > Anyway there can be no doubt that this happens because the big buffer pool is > running empty. > > We fixed that in commit 4c94cc2d3d57 ("tipc: fall back to smaller MTU if > allocation of local send skb fails") which was delivered to Linux 4.16. > > Do you have any opportunity to apply that patch and try it? > > BR > ///jon > >> -----Original Message----- >> From: Rune Torgersen <ru...@innovsys.com> >> Sent: 17-Oct-19 12:38 >> To: 'tipc-discussion@lists.sourceforge.net' <tipc- >> discuss...@lists.sourceforge.net> >> Subject: [tipc-discussion] Error allocating memeory error when sending RDM >> message >> >> Hi. >> >> I am running into an issue when sending SOCK_RDM or SOCK_DGRAM >> messages. On a system that has been up for a time (120+ days inthis case), I >> cannot send any RDM/DGRAM type TIPC messages that are larger than about >> 16000 bytes (16033+ fails, 15100 and smaller still works). >> Any larger messages fails with erro code 12 :"Cannot allocate memory". >> >> Really odd thing about it only happens on some connections and not others, >> on the same system (example, sending to tipc node 103:1003 gets no error, >> while sending to 103:3 get error). >> When it gets into this state, it seems to happen forever on the same >> destination address, and not on others until system is rebooted. (restarting >> the >> server side application makes no difference). >> The sends are done on the same node as the receiver is on. >> >> Kernel is Ubuntu 16.04 LTS 4.4.0-150 in this case, also seen on 161. >> >> Nametable for 103: >> 103 2 2 <1.1.1:2328193343> 2328193344 >> cluster >> 103 3 3 <1.1.2:3153441800> 3153441801 >> cluster >> 103 5 5 <1.1.4:269294867> 269294868 >> cluster >> 103 1002 1002 <1.1.1:490133365> 490133366 >> cluster >> 103 1003 1003 <1.1.2:2552019732> 2552019733 >> cluster >> 103 1005 1005 <1.1.4:625110186> 625110187 >> cluster >> >> _______________________________________________ >> tipc-discussion mailing list >> tipc-discussion@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > > _______________________________________________ > tipc-discussion mailing list > tipc-discussion@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > > _______________________________________________ > tipc-discussion mailing list > tipc-discussion@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion