Re: [PATCH 0/3] Unmapped page cache control (v5)

2011-04-03 Thread Dave Chinner
On Sun, Apr 03, 2011 at 06:32:16PM +0900, KOSAKI Motohiro wrote:
> > On Fri, Apr 01, 2011 at 10:17:56PM +0900, KOSAKI Motohiro wrote:
> > > > > But, I agree that now we have to concern slightly large VM change 
> > > > > parhaps
> > > > > (or parhaps not). Ok, it's good opportunity to fill out some thing.
> > > > > Historically, Linux MM has "free memory are waste memory" policy, and 
> > > > > It
> > > > > worked completely fine. But now we have a few exceptions.
> > > > >
> > > > > 1) RT, embedded and finance systems. They really hope to avoid reclaim
> > > > >latency (ie avoid foreground reclaim completely) and they can 
> > > > > accept
> > > > >to make slightly much free pages before memory shortage.
> > > > 
> > > > In general we need a mechanism to ensure we can avoid reclaim during
> > > > critical sections of application. So some way to give some hints to the
> > > > machine to free up lots of memory (/proc/sys/vm/dropcaches is far too
> > > > drastic) may be useful.
> > > 
> > > Exactly.
> > > I've heard multiple times this request from finance people. And I've also 
> > > heared the same request from bullet train control software people 
> > > recently.
> > 
[...]
> > Fundamentally, if you just switch off memory reclaim to avoid the
> > latencies involved with direct memory reclaim, then all you'll get
> > instead is ENOMEM because there's no memory available and none will be
> > reclaimed. That's even more fatal for the system than doing reclaim.
> 
> You have two level oversight.
> 
> Firstly, *ALL* RT application need to cooperate applications, kernel, 
> and other various system level daemons. That's no specific issue of 
> this topic. OK, *IF* RT application run egoistic, a system may hang 
> up easily even routh mere simple busy loop, yes. But, Who want to do so?

Sure - that's RT-101. I think I have a good understanding of these
principles after spending 7 years of my life working on wide-area
distributed real-time control systems (think city-scale water and
electricity supply).

> Secondly, You misparsed "avoid direct reclaim" paragraph. We don't talk
> about "avoid direct reclaim even if system memory is no enough", We talk
> about "avoid direct reclaim by preparing before". 

I don't think I misparsed it. I am addressing the "avoid direct
reclaim by preparing before" principle directly. The problem with it
is that just enalrging the free memory pool doesn't guarantee future
allocation success when there are other concurrent allocations
occurring. IOWs, if you don't _reserve_ the free memory for the
critical area in advance then there is no guarantee it will be
available when needed by the critical section.

A simple example: the radix tree node preallocation code to
guarantee inserts succeed while holding a spinlock. If just relying
on free memory was sufficient, then GFP_ATOMIC allocations are all
that is necessary. However, even that isn't sufficient as even the
GFP_ATOMIC reserved pool can be exhausted by other concurrent
GFP_ATOMIC allocations. Hence preallocation is required before
entering the critical section to guarantee success in all cases.

And to state the obvious: doing allocation before the critical
section will trigger reclaim if necessary so there is no need to
have the application trigger reclaim.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Unmapped page cache control (v5)

2011-04-01 Thread Dave Chinner
On Fri, Apr 01, 2011 at 10:17:56PM +0900, KOSAKI Motohiro wrote:
> > > But, I agree that now we have to concern slightly large VM change parhaps
> > > (or parhaps not). Ok, it's good opportunity to fill out some thing.
> > > Historically, Linux MM has "free memory are waste memory" policy, and It
> > > worked completely fine. But now we have a few exceptions.
> > >
> > > 1) RT, embedded and finance systems. They really hope to avoid reclaim
> > >latency (ie avoid foreground reclaim completely) and they can accept
> > >to make slightly much free pages before memory shortage.
> > 
> > In general we need a mechanism to ensure we can avoid reclaim during
> > critical sections of application. So some way to give some hints to the
> > machine to free up lots of memory (/proc/sys/vm/dropcaches is far too
> > drastic) may be useful.
> 
> Exactly.
> I've heard multiple times this request from finance people. And I've also 
> heared the same request from bullet train control software people recently.

Well, that's enough to make me avoid Japanese trains in future. If
your critical control system has problems with memory reclaim
interfering with it's operation, then you are doing something
very, very wrong.

If you have a need to avoid memory allocation latency during
specific critical sections then the critical section needs to:

a) have all it's memory preallocated and mlock()d in advance

b) avoid doing anything that requires memory to be
   allocated.

These are basic design rules for time-sensitive applications.

Fundamentally, if you just switch off memory reclaim to avoid the
latencies involved with direct memory reclaim, then all you'll get
instead is ENOMEM because there's no memory available and none will be
reclaimed. That's even more fatal for the system than doing reclaim.

IMO, you should tell the people requesting stuff like this to
architect their critical sections according to best practices.
Hacking the VM to try to work around badly designed applications is
a sure recipe for disaster...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Unmapped page cache control (v5)

2011-03-31 Thread Dave Chinner
On Fri, Apr 01, 2011 at 08:38:11AM +0530, Balbir Singh wrote:
> * Dave Chinner  [2011-04-01 08:40:33]:
> 
> > On Wed, Mar 30, 2011 at 11:00:26AM +0530, Balbir Singh wrote:
> > > 
> > > The following series implements page cache control,
> > > this is a split out version of patch 1 of version 3 of the
> > > page cache optimization patches posted earlier at
> > > Previous posting http://lwn.net/Articles/425851/ and analysis
> > > at http://lwn.net/Articles/419713/
> > > 
> > > Detailed Description
> > > 
> > > This patch implements unmapped page cache control via preferred
> > > page cache reclaim. The current patch hooks into kswapd and reclaims
> > > page cache if the user has requested for unmapped page control.
> > > This is useful in the following scenario
> > > - In a virtualized environment with cache=writethrough, we see
> > >   double caching - (one in the host and one in the guest). As
> > >   we try to scale guests, cache usage across the system grows.
> > >   The goal of this patch is to reclaim page cache when Linux is running
> > >   as a guest and get the host to hold the page cache and manage it.
> > >   There might be temporary duplication, but in the long run, memory
> > >   in the guests would be used for mapped pages.
> > 
> > What does this do that "cache=none" for the VMs and using the page
> > cache inside the guest doesn't acheive? That avoids double caching
> > and doesn't require any new complexity inside the host OS to
> > acheive...
> >
> 
> There was a long discussion on cache=none in the first posting and the
> downsides/impact on throughput. Please see
> http://www.mail-archive.com/kvm@vger.kernel.org/msg30655.html 

All there is in that thread is handwaving about the differences
between cache=none vs cache=writeback behaviour and about the amount
of data loss/corruption when failures occur.  There is only one real
example provided about real world performance in the entire thread,
but the root cause of the performance difference is not analysed,
determined and understood.  Hence I'm not convinced from this thread
that using cache=write* and using this functionality is
anything other than papering over some still unknown problem

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Unmapped page cache control (v5)

2011-03-31 Thread Dave Chinner
On Wed, Mar 30, 2011 at 11:00:26AM +0530, Balbir Singh wrote:
> 
> The following series implements page cache control,
> this is a split out version of patch 1 of version 3 of the
> page cache optimization patches posted earlier at
> Previous posting http://lwn.net/Articles/425851/ and analysis
> at http://lwn.net/Articles/419713/
> 
> Detailed Description
> 
> This patch implements unmapped page cache control via preferred
> page cache reclaim. The current patch hooks into kswapd and reclaims
> page cache if the user has requested for unmapped page control.
> This is useful in the following scenario
> - In a virtualized environment with cache=writethrough, we see
>   double caching - (one in the host and one in the guest). As
>   we try to scale guests, cache usage across the system grows.
>   The goal of this patch is to reclaim page cache when Linux is running
>   as a guest and get the host to hold the page cache and manage it.
>   There might be temporary duplication, but in the long run, memory
>   in the guests would be used for mapped pages.

What does this do that "cache=none" for the VMs and using the page
cache inside the guest doesn't acheive? That avoids double caching
and doesn't require any new complexity inside the host OS to
acheive...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html