Re: Network vm deadlock... solution?
On Tue, 2005-08-02 at 06:54 +1000, Daniel Phillips wrote: Hi guys, Well I have been reading net code seriously for two days, so I am still basically a complete network klutz. But we have a nasty network-realted vm deadlock that needs fixing and there seems to be little choice but to wade in and try to sort things out. We are also working on a similar problem where a set of critical TCP connections need to successfully send/receive messages even under very low memory conditions. But the assumption is that the low memory situation lasts only for a short time(in the order of few minutes) which should not cause any TCP timeouts to expire so that normal connections can recover once the low memory situation is resolved. Here is the plan: * All protocols used on an interface that supports block IO must be vm-aware. If we wish, we can leave it up to the administrator to ensure that only vm-aware protocols are used on an interface that supports block IO, or we can do some automatic checking. * Any socket to be used for block IO will be marked as a vmhelper. I am assuming your 'vmhelper' is similar to a critical socket which can be marked using a new socket option(ex: SO_CRITICAL). The number of protocols that need to have this special knowledge is quite small, e.g.: tcp, udp, sctp, icmp, arp, maybe a few others. We are talking about a line or two of code in each to add the necessary awareness. * Inside the network driver, when memory is low we will allocate space for every incoming packet from a memory reserve, regardless of whether it is related to block IO or not. * Under low memory, we call the protocol layer synchronously instead of queuing the packet through softnet. We do not necessarily have to bypass softnet, since there is a mechanism for thottling packets at this point. However, there is a big problem with throttling here: we haven't classified the packet yet, so the throttling might discard some block IO packets, which is exactly what we don't want to do under memory pressure. * The protocol receive handler does the socket lookup, then if memory is low, discards any packet not belonging to a vmhelper socket. Roughly speaking, the driver allocates each skb via: skb = memory_pressure ? dev_alloc_skb_reserve() : dev_alloc_skb(); Instead of changing all the drivers to make them vm aware, we could add a new priority flag(something like GFP_CRITICAL) which can be passed to __dev_alloc_skb(). dev_alloc_skb becomes return __dev_alloc_skb(length, GFP_ATOMIC|GFP_CRITICAL); Based on the memory pressure conditon, the VM can decide if the skb needs to allocated from an emergency reserve. Then the driver hands off the packet to netif_rx, which does: if (from_reserve(skb)) { netif_receive_skb(skb); return; } And in the protocol handler we have: if (memory_pressure !is_vmhelper(sock) from_reserve(skb)) goto drop_the_packet; I am not sure if we need the from_reserve() checks above. We have to assume that all incoming packets are critical until we can find the matching sk in the protocol handler code. That is pretty much it. Now, being a net newbie, it is not entirely clear to me that we can call netif_receive_skb directly when packets are also being queued through the softnet interface. May I have some guidance on this point, please? If that works, I am prepared to justify and prove the rest. Regards, Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network vm deadlock... solution?
--Francois Romieu [EMAIL PROTECTED] wrote (on Tuesday, August 02, 2005 23:43:40 +0200): Daniel Phillips [EMAIL PROTECTED] : [...] A point on memory pressure: here, we are not talking about the continuous state of running under heavy load, but rather the microscopically short periods where not a single page of memory is available to normal tasks. It is when a block IO event happens to land inside one of those microscopically short periods that we run into problems. You suggested in a previous message to use an emergency allocation pool at the driver level. Afaik, 1) the usual network driver can already buffer a bit with its Rx descriptor ring and 2) it more or less tries to refill it each time napi issues its -poll() method. So it makes me wonder: - have you collected evidence that the drivers actually run out of memory in the (microscopical) situation you describe ? There's other situations where it does (ie swap device dies, etc). - instead of modifying each and every driver to be vm aware, why don't you hook in net_rx_action() when memory starts to be low ? Btw I do not get what the mempool/GFP_CRITICAL idea buys: it seems redundant with the threshold (if (memory_pressure)) used in the Rx path to decide that memory is low. It's send-side, not receive. M. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network vm deadlock... solution?
On Wednesday 03 August 2005 07:43, Francois Romieu wrote: Daniel Phillips [EMAIL PROTECTED] : [...] A point on memory pressure: here, we are not talking about the continuous state of running under heavy load, but rather the microscopically short periods where not a single page of memory is available to normal tasks. It is when a block IO event happens to land inside one of those microscopically short periods that we run into problems. You suggested in a previous message to use an emergency allocation pool at the driver level. Afaik, 1) the usual network driver can already buffer a bit with its Rx descriptor ring and 2) it more or less tries to refill it each time napi issues its -poll() method. So it makes me wonder: - have you collected evidence that the drivers actually run out of memory in the (microscopical) situation you describe ? Yes, e.g: http://thunker.thunk.org/pipermail/ksummit-2005-discuss/2005-March/000200.html and NBD is known to be unreliable for this reason. I plan to put together a before-and-after test that everybody can try, but after I show the patch for comment. - instead of modifying each and every driver to be vm aware, why don't you hook in net_rx_action() when memory starts to be low ? Two reasons: * The first handling has to be where the packet is allocated * net_rx_action is on the far side of a queue, which would need to be throttled separately. But the throttle would not know which packets to discard, because the packet headers have not been examined yet. Btw I do not get what the mempool/GFP_CRITICAL idea buys: it seems redundant with the threshold (if (memory_pressure)) used in the Rx path to decide that memory is low. It is not to decide if memory is low, but to tell the vm system that it is allowed to allocate from the reserve if normal memory is exhausted. Regards, Daniel. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network vm deadlock... solution?
On Wednesday 03 August 2005 08:39, Martin J. Bligh wrote: --Francois Romieu [EMAIL PROTECTED] wrote (on Tuesday, August 02, 2005 Btw I do not get what the mempool/GFP_CRITICAL idea buys: it seems redundant with the threshold (if (memory_pressure)) used in the Rx path to decide that memory is low. It's send-side, not receive. Receive side. Send side also needs reserve+throttling but it is easier because we flag packets at allocation time for special handling. Regards, Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html