Re: Network vm deadlock... solution?

2005-08-02 Thread Sridhar Samudrala
On Tue, 2005-08-02 at 06:54 +1000, Daniel Phillips wrote:
 Hi guys,
 
 Well I have been reading net code seriously for two days, so I am still 
 basically a complete network klutz.  But we have a nasty network-realted vm 
 deadlock that needs fixing and there seems to be little choice but to wade in 
 and try to sort things out.
 

We are also working on a similar problem where a set of critical TCP
connections need to successfully send/receive messages even under very
low memory conditions. But the assumption is that the low memory
situation lasts only for a short time(in the order of few minutes)
which should not cause any TCP timeouts to expire so that normal
connections can recover once the low memory situation is resolved.

 Here is the plan:
 
   * All protocols used on an interface that supports block IO must be
 vm-aware.
 
 If we wish, we can leave it up to the administrator to ensure that only 
 vm-aware protocols are used on an interface that supports block IO, or we can 
 do some automatic checking.
 
   * Any socket to be used for block IO will be marked as a vmhelper.

I am assuming your 'vmhelper' is similar to a critical socket which can
be marked using a new socket option(ex: SO_CRITICAL).

 
 The number of protocols that need to have this special knowledge is quite 
 small, e.g.: tcp, udp, sctp, icmp, arp, maybe a few others.  We are talking 
 about a line or two of code in each to add the necessary awareness.
 
   * Inside the network driver, when memory is low we will allocate space
 for every incoming packet from a memory reserve, regardless of whether
 it is related to block IO or not.
 
   * Under low memory, we call the protocol layer synchronously instead of
 queuing the packet through softnet.
 
 We do not necessarily have to bypass softnet, since there is a mechanism for 
 thottling packets at this point.  However, there is a big problem with 
 throttling here: we haven't classified the packet yet, so the throttling 
 might discard some block IO packets, which is exactly what we don't want to 
 do under memory pressure.
 
   * The protocol receive handler does the socket lookup, then if memory is
 low, discards any packet not belonging to a vmhelper socket.
 
 Roughly speaking, the driver allocates each skb via:
 
 skb = memory_pressure ? dev_alloc_skb_reserve() : dev_alloc_skb();

Instead of changing all the drivers to make them vm aware, we could add
a new priority flag(something like GFP_CRITICAL) which can be passed to
__dev_alloc_skb(). dev_alloc_skb becomes
return __dev_alloc_skb(length, GFP_ATOMIC|GFP_CRITICAL);

Based on the memory pressure conditon, the VM can decide if the skb
needs to allocated from an emergency reserve.

 
 Then the driver hands off the packet to netif_rx, which does:
 
 if (from_reserve(skb)) {
   netif_receive_skb(skb);
 return;
   }

 And in the protocol handler we have:
 
 if (memory_pressure  !is_vmhelper(sock)  from_reserve(skb))
 goto drop_the_packet;

I am not sure if we need the from_reserve() checks above.
We have to assume that all incoming packets are critical until we can
find the matching sk in the protocol handler code.

 
 That is pretty much it.  Now, being a net newbie, it is not entirely clear to 
 me that we can call netif_receive_skb directly when packets are also being 
 queued through the softnet interface.  May I have some guidance on this 
 point, please?
 
 If that works, I am prepared to justify and prove the rest.
 
 Regards,
 
 Daniel
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network vm deadlock... solution?

2005-08-02 Thread Martin J. Bligh


--Francois Romieu [EMAIL PROTECTED] wrote (on Tuesday, August 02, 2005 
23:43:40 +0200):

 Daniel Phillips [EMAIL PROTECTED] :
 [...]
 A point on memory pressure: here, we are not talking about the continuous 
 state of running under heavy load, but rather the microscopically short 
 periods where not a single page of memory is available to normal tasks.  It 
 is when a block IO event happens to land inside one of those microscopically 
 short periods that we run into problems.
 
 You suggested in a previous message to use an emergency allocation pool at
 the driver level. Afaik, 1) the usual network driver can already buffer a
 bit with its Rx descriptor ring and 2) it more or less tries to refill it
 each time napi issues its -poll() method. So it makes me wonder:
 - have you collected evidence that the drivers actually run out of memory
   in the (microscopical) situation you describe ?

There's other situations where it does (ie swap device dies, etc).

 - instead of modifying each and every driver to be vm aware, why don't
   you hook in net_rx_action() when memory starts to be low ?
 
 Btw I do not get what the mempool/GFP_CRITICAL idea buys: it seems redundant
 with the threshold (if (memory_pressure)) used in the Rx path to decide
 that memory is low.

It's send-side, not receive.

M.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network vm deadlock... solution?

2005-08-02 Thread Daniel Phillips
On Wednesday 03 August 2005 07:43, Francois Romieu wrote:
 Daniel Phillips [EMAIL PROTECTED] :
 [...]

  A point on memory pressure: here, we are not talking about the continuous
  state of running under heavy load, but rather the microscopically short
  periods where not a single page of memory is available to normal tasks. 
  It is when a block IO event happens to land inside one of those
  microscopically short periods that we run into problems.

 You suggested in a previous message to use an emergency allocation pool at
 the driver level. Afaik, 1) the usual network driver can already buffer a
 bit with its Rx descriptor ring and 2) it more or less tries to refill it
 each time napi issues its -poll() method. So it makes me wonder:
 - have you collected evidence that the drivers actually run out of memory
   in the (microscopical) situation you describe ?

Yes, e.g:

   
http://thunker.thunk.org/pipermail/ksummit-2005-discuss/2005-March/000200.html

and NBD is known to be unreliable for this reason.  I plan to put together
a before-and-after test that everybody can try, but after I show the patch for
comment.

 - instead of modifying each and every driver to be vm aware, why don't
   you hook in net_rx_action() when memory starts to be low ?

Two reasons:

  * The first handling has to be where the packet is allocated

  * net_rx_action is on the far side of a queue, which would need to be
throttled separately.  But the throttle would not know which packets to
discard, because the packet headers have not been examined yet.

 Btw I do not get what the mempool/GFP_CRITICAL idea buys: it seems
 redundant with the threshold (if (memory_pressure)) used in the Rx path
 to decide that memory is low.

It is not to decide if memory is low, but to tell the vm system that it is
allowed to allocate from the reserve if normal memory is exhausted.

Regards,

Daniel.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network vm deadlock... solution?

2005-08-02 Thread Daniel Phillips
On Wednesday 03 August 2005 08:39, Martin J. Bligh wrote:
 --Francois Romieu [EMAIL PROTECTED] wrote (on Tuesday, August 02, 2005 
  Btw I do not get what the mempool/GFP_CRITICAL idea buys: it seems
  redundant with the threshold (if (memory_pressure)) used in the Rx path
  to decide that memory is low.

 It's send-side, not receive.

Receive side.  Send side also needs reserve+throttling but it is easier 
because we flag packets at allocation time for special handling.

Regards,

Daniel
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html