Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-15 Thread William Lee Irwin III
On Tue, Mar 13, 2007 at 06:12:44PM -0700, William Lee Irwin III wrote:
> There are furthermore distinctions to make between fork() and execve().
> fork() stomps over the entire process address space copying pagetables
> en masse. After execve() a process incrementally faults in PTE's one at
> a time. It should be clear that if case analyses are of interest at
> all, fork() will want cache-hot pages (cache-preloaded pages?) where
> such are largely wasted on incremental faults after execve(). The copy
> operations in fork() should probably also be examined in the context of
> shared pagetables at some point.

To make this perfectly clear, we can deal with the varying usage cases
with hot/cold flags to the pagetable allocator functions. Where bulk
copies such as fork() are happening, it makes perfect sense to
precharge the cache by eager zeroing. Where sparse single pte affairs
such as incrementally faulting things in after execve() are involved,
cache cold preconstructed pagetable pages are ideal. Address hints
could furthermore be used to precharge single cachelines (e.g. via
prefetch) in the sparse usage case.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-15 Thread Christoph Lameter
On Tue, 13 Mar 2007, Andrew Morton wrote:

> > 1. We need to support other states of pages other than zeroed.
> 
> What does this mean?

pgd are not completely zeroed. They contain mappings that are always 
present. Thus the state is not a zeroed state.

> > 2. Prezeroing does not make much sense if a large portion of the
> >page is being used. Performance is better if the whole page 
> >is zeroed directly before use.Prezeroing only makes sense for sparse
> >allocations like the page table pages.
> 
> This is not related to the above discussion.

Really? I definitely see the word prezeroing in the discussion.

> > I already tried that 3 years ago and there was *no* benefit for usual
> > users of the a page allocator. The advantage exists only if a small
> > portion of the page is used. F.e. For one cacheline there was a 4x 
> > improvement. See lkml archives for prezeroing.
> 
> Unsurprised.  Were non-temporal stores tried?

Yes with no material change. The work lead to making ia64 use non 
temporal stores for spin unlock but it was not useful for prezeroing.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-15 Thread Christoph Lameter
On Tue, 13 Mar 2007, Andrew Morton wrote:

> > On Tue, 13 Mar 2007 04:20:48 -0700 (PDT) Christoph Lameter <[EMAIL 
> > PROTECTED]> wrote:
> > On Tue, 13 Mar 2007, Andrew Morton wrote:
> > 
> > > Yeah, prezeroing in idle is probably pointless.  But I'm not aware of
> > > anyone having tried it properly...
> > 
> > Ok, then what did I do wrong 3 years ago with the prezeroing patchsets?
> 
> Failed to provide us a link to it?

You merged part of it and were involved in the discussions.

General overviews:

http://lwn.net/Articles/117881/
http://lwn.net/Articles/128225/

The details on the problems with prezeroing and touching multiple 
cachelines of the page.

http://www.gelato.unsw.edu.au/archives/linux-ia64/0412/12252.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-15 Thread Christoph Lameter
On Tue, 13 Mar 2007, Andrew Morton wrote:

  On Tue, 13 Mar 2007 04:20:48 -0700 (PDT) Christoph Lameter [EMAIL 
  PROTECTED] wrote:
  On Tue, 13 Mar 2007, Andrew Morton wrote:
  
   Yeah, prezeroing in idle is probably pointless.  But I'm not aware of
   anyone having tried it properly...
  
  Ok, then what did I do wrong 3 years ago with the prezeroing patchsets?
 
 Failed to provide us a link to it?

You merged part of it and were involved in the discussions.

General overviews:

http://lwn.net/Articles/117881/
http://lwn.net/Articles/128225/

The details on the problems with prezeroing and touching multiple 
cachelines of the page.

http://www.gelato.unsw.edu.au/archives/linux-ia64/0412/12252.html

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-15 Thread Christoph Lameter
On Tue, 13 Mar 2007, Andrew Morton wrote:

  1. We need to support other states of pages other than zeroed.
 
 What does this mean?

pgd are not completely zeroed. They contain mappings that are always 
present. Thus the state is not a zeroed state.

  2. Prezeroing does not make much sense if a large portion of the
 page is being used. Performance is better if the whole page 
 is zeroed directly before use.Prezeroing only makes sense for sparse
 allocations like the page table pages.
 
 This is not related to the above discussion.

Really? I definitely see the word prezeroing in the discussion.

  I already tried that 3 years ago and there was *no* benefit for usual
  users of the a page allocator. The advantage exists only if a small
  portion of the page is used. F.e. For one cacheline there was a 4x 
  improvement. See lkml archives for prezeroing.
 
 Unsurprised.  Were non-temporal stores tried?

Yes with no material change. The work lead to making ia64 use non 
temporal stores for spin unlock but it was not useful for prezeroing.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-15 Thread William Lee Irwin III
On Tue, Mar 13, 2007 at 06:12:44PM -0700, William Lee Irwin III wrote:
 There are furthermore distinctions to make between fork() and execve().
 fork() stomps over the entire process address space copying pagetables
 en masse. After execve() a process incrementally faults in PTE's one at
 a time. It should be clear that if case analyses are of interest at
 all, fork() will want cache-hot pages (cache-preloaded pages?) where
 such are largely wasted on incremental faults after execve(). The copy
 operations in fork() should probably also be examined in the context of
 shared pagetables at some point.

To make this perfectly clear, we can deal with the varying usage cases
with hot/cold flags to the pagetable allocator functions. Where bulk
copies such as fork() are happening, it makes perfect sense to
precharge the cache by eager zeroing. Where sparse single pte affairs
such as incrementally faulting things in after execve() are involved,
cache cold preconstructed pagetable pages are ideal. Address hints
could furthermore be used to precharge single cachelines (e.g. via
prefetch) in the sparse usage case.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread William Lee Irwin III
On Tue, Mar 13, 2007 at 04:47:56AM -0800, Andrew Morton wrote:
> I'm trying to remember why we ever would have needed to zero out the
> pagetable pages if we're taking down the whole mm?  Maybe it's
> because "oh, the arch wants to put this page into a quicklist to
> recycle it", which is all rather circular.
> It would be interesting to look at a) leave the page full of random
> garbage if we're releasing the whole mm and b) return it straight to
> the page allocator.

We never did need to modify ptes on exit() or other pagetable prunings
(not that they were ever done outside exit() before 2.6.x). The only
subtlety is that pruning on munmap() needs a TLB flush for the TLB
itself to drop the references to the pages referred to by the PTE's on
pruning in the presence of hardware pagetable walkers (in the exit()
case there are no user execution contexts left to potentially utilize
the dead translations so it's less important). That's handled by
tlb_remove_page() and shouldn't need any updates across such a change.

I believe the zeroing on teardown was largely a result of idiom vs.
any particular need. Essentially using ptep_get_and_clear() to handle
the non-pruning munmap() case in a manner unified with other pagetable
teardowns. Also likely is 2.4.x legacy from when that and possibly
earlier kernels maintained arch-private quicklists for pagetables.

There are furthermore distinctions to make between fork() and execve().
fork() stomps over the entire process address space copying pagetables
en masse. After execve() a process incrementally faults in PTE's one at
a time. It should be clear that if case analyses are of interest at
all, fork() will want cache-hot pages (cache-preloaded pages?) where
such are largely wasted on incremental faults after execve(). The copy
operations in fork() should probably also be examined in the context of
shared pagetables at some point.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Paul Mackerras
Andrew Morton writes:

> Plus, we can get in a situation where take a cache-cold, known-zero page
> from the pte quicklist when there is a cache-hot, non-zero page sitting in
> the page allocator.  I suspect that zeroing the cache-hot page would take a
> similar amount of time to a single miss agains the cache-cold page.

That is certainly the case on powerpc.

> I'm not saying that I _know_ that the quicklists are pointless, but I don't
> think it's established that they are pointful.

I don't see much point to them.  For powerpc, I would rather grab an
arbitrary page and zero it than get a page off a quicklist.

> Maybe, dunno.  It was apparently a win on powerpc many years ago.  I had a

My recollection was that it wasn't a win, but it was a long time ago...

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Peter Chubb
> "Jeremy" == Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:


Jeremy> And do the same in pte pages for actual mapped pages?  Or do
Jeremy> you think they would be too densely populated for it to be
Jeremy> worthwhile?

We've been doing some measurements on how densely clumped ptes are.
On 32-bit platforms, they're pretty dense.  On IA64, quite a bit
sparser, depending on the workload of course.  I think that's mostly because
of the larger pagesize on IA64 -- with 64k pages, you don't need very
many to map a small object.

I'm hoping IanW can give more details.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread David Miller
From: Matt Mackall <[EMAIL PROTECTED]>
Date: Tue, 13 Mar 2007 16:14:35 -0500

> Well you -could- do this:
> 
> - reuse a long in struct page as a used map that divides the page up
>   into 32 or 64 segments
> - every time you set a PTE, set the corresponding bit in the mask
> - when we zap, only visit the regions set in the mask
> 
> Thus, you avoid visiting most of a PMD page in the sparse case,
> assuming PTEs aren't evenly spread across the PMD.
> 
> This might not even be too horrible as the appropriate struct page
> should be in cache with the appropriate bits of the mm already locked,
> etc.

Yes, I've even had that idea before.

You can even hide it behind pmd_none() et al., the generic VM
doesn't even have to know that the page table macros are doing
this optimization.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Jeremy Fitzhardinge
Matt Mackall wrote:
> Well you -could- do this:
>
> - reuse a long in struct page as a used map that divides the page up
>   into 32 or 64 segments
> - every time you set a PTE, set the corresponding bit in the mask
> - when we zap, only visit the regions set in the mask
>
> Thus, you avoid visiting most of a PMD page in the sparse case,
> assuming PTEs aren't evenly spread across the PMD.
>
> This might not even be too horrible as the appropriate struct page
> should be in cache with the appropriate bits of the mm already locked,
> etc.
>   

And do the same in pte pages for actual mapped pages?  Or do you think
they would be too densely populated for it to be worthwhile?

J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Matt Mackall
On Tue, Mar 13, 2007 at 02:07:22PM -0700, David Miller wrote:
> From: Matt Mackall <[EMAIL PROTECTED]>
> Date: Tue, 13 Mar 2007 15:21:25 -0500
> 
> > Because the fan-out is large, the bulk of the work is bringing the last
> > layer of the tree into cache to find all the pages in the address
> > space. And there's really no way around that.
> 
> That's right.
> 
> And I will note that historically we used to be much worse
> in this area, as we used to walk the page table tree twice
> on address space teardown (once to hit the PTE entries, once
> to free the page tables).
> 
> Happily it is a one-pass algorithm now.
> 
> But, within active VMA ranges, we do have to walk all
> the bits at least one time.

Well you -could- do this:

- reuse a long in struct page as a used map that divides the page up
  into 32 or 64 segments
- every time you set a PTE, set the corresponding bit in the mask
- when we zap, only visit the regions set in the mask

Thus, you avoid visiting most of a PMD page in the sparse case,
assuming PTEs aren't evenly spread across the PMD.

This might not even be too horrible as the appropriate struct page
should be in cache with the appropriate bits of the mm already locked,
etc.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread David Miller
From: Matt Mackall <[EMAIL PROTECTED]>
Date: Tue, 13 Mar 2007 15:21:25 -0500

> Because the fan-out is large, the bulk of the work is bringing the last
> layer of the tree into cache to find all the pages in the address
> space. And there's really no way around that.

That's right.

And I will note that historically we used to be much worse
in this area, as we used to walk the page table tree twice
on address space teardown (once to hit the PTE entries, once
to free the page tables).

Happily it is a one-pass algorithm now.

But, within active VMA ranges, we do have to walk all
the bits at least one time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Matt Mackall
On Tue, Mar 13, 2007 at 01:17:00PM -0700, Jeremy Fitzhardinge wrote:
> Matt Mackall wrote:
> > On Tue, Mar 13, 2007 at 10:30:10AM -0700, Jeremy Fitzhardinge wrote:
> >   
> >> Nick Piggin wrote:
> >> 
> >>> However we still have to visit those to-be-unmapped parts of the page
> >>> table,
> >>> to find the pages and free them. So we still at least need to bring it
> >>> into
> >>> cache for the read... at which point, the store probably isn't a big
> >>> burden.
> >>>   
> >> Why not try to find a place to stash a linklist pointer and link them
> >> all together?  Saves the pulldown pagetable walk altogether.
> >> 
> >
> > Because we'd need one link per mm that a page is mapped in?
> >   
> 
> Can pagetable pages be shared between mms?  (Kernel pmds in PAE excepted.)

Ahh, I think the issue is that we have to walk the page tables to drop
the reference count of the _actual pages_ they point to. The page
tables themselves could all be put on a list or two lists (one for
PMDs, one for everything else), but that wouldn't really be a win over
just walking the tree, especially given the extra list maintenance.

Because the fan-out is large, the bulk of the work is bringing the last
layer of the tree into cache to find all the pages in the address
space. And there's really no way around that.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Matt Mackall
On Tue, Mar 13, 2007 at 10:30:10AM -0700, Jeremy Fitzhardinge wrote:
> Nick Piggin wrote:
> > However we still have to visit those to-be-unmapped parts of the page
> > table,
> > to find the pages and free them. So we still at least need to bring it
> > into
> > cache for the read... at which point, the store probably isn't a big
> > burden.
> 
> Why not try to find a place to stash a linklist pointer and link them
> all together?  Saves the pulldown pagetable walk altogether.

Because we'd need one link per mm that a page is mapped in?

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Jeremy Fitzhardinge
Matt Mackall wrote:
> On Tue, Mar 13, 2007 at 10:30:10AM -0700, Jeremy Fitzhardinge wrote:
>   
>> Nick Piggin wrote:
>> 
>>> However we still have to visit those to-be-unmapped parts of the page
>>> table,
>>> to find the pages and free them. So we still at least need to bring it
>>> into
>>> cache for the read... at which point, the store probably isn't a big
>>> burden.
>>>   
>> Why not try to find a place to stash a linklist pointer and link them
>> all together?  Saves the pulldown pagetable walk altogether.
>> 
>
> Because we'd need one link per mm that a page is mapped in?
>   

Can pagetable pages be shared between mms?  (Kernel pmds in PAE excepted.)

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Jeremy Fitzhardinge
Nick Piggin wrote:
> However we still have to visit those to-be-unmapped parts of the page
> table,
> to find the pages and free them. So we still at least need to bring it
> into
> cache for the read... at which point, the store probably isn't a big
> burden.

Why not try to find a place to stash a linklist pointer and link them
all together?  Saves the pulldown pagetable walk altogether.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Nick Piggin

Andrew Morton wrote:

On Tue, 13 Mar 2007 23:01:11 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote:
Andrew Morton wrote:



It would be interesting to look at a) leave the page full of random garbage
if we're releasing the whole mm and b) return it straight to the page allocator.


Well we have the 'fullmm' case, which avoids all the locked pte operations
(for those architectures where hardware pt walking requires atomicity).



I suspect there are some tlb operations which could be skipped in that case
too.


Depends on the tlb flush implementation. The generic one doesn't look like
it is all that smart about optimising the fullmm case. It does skip some
tlb flushing though.


However we still have to visit those to-be-unmapped parts of the page table
to find the pages and free them. So we still at least need to bring it into
cache for the read... at which point, the store probably isn't a big burden.



It means all that data has to be written back.  Yes, I expect it'll prove
to be less costly than the initial load.


Still, it is something we could try.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Andrew Morton
> On Tue, 13 Mar 2007 23:01:11 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote:
> Andrew Morton wrote:
> >>On Tue, 13 Mar 2007 22:30:19 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote:
> >>We don't actually have to zap_pte_range the entire page table in
> >>order to free it (IIRC we used to have to, before the 4lpt patches).
> > 
> > 
> > I'm trying to remember why we ever would have needed to zero out the 
> > pagetable
> > pages if we're taking down the whole mm?  Maybe it's because "oh, the
> > arch wants to put this page into a quicklist to recycle it", which is
> > all rather circular.
> > 
> > It would be interesting to look at a) leave the page full of random garbage
> > if we're releasing the whole mm and b) return it straight to the page 
> > allocator.
> 
> Well we have the 'fullmm' case, which avoids all the locked pte operations
> (for those architectures where hardware pt walking requires atomicity).

I suspect there are some tlb operations which could be skipped in that case
too.

> However we still have to visit those to-be-unmapped parts of the page table
> to find the pages and free them. So we still at least need to bring it into
> cache for the read... at which point, the store probably isn't a big burden.

It means all that data has to be written back.  Yes, I expect it'll prove
to be less costly than the initial load.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Nick Piggin

Andrew Morton wrote:

On Tue, 13 Mar 2007 22:30:19 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote:
We don't actually have to zap_pte_range the entire page table in
order to free it (IIRC we used to have to, before the 4lpt patches).



I'm trying to remember why we ever would have needed to zero out the pagetable
pages if we're taking down the whole mm?  Maybe it's because "oh, the
arch wants to put this page into a quicklist to recycle it", which is
all rather circular.

It would be interesting to look at a) leave the page full of random garbage
if we're releasing the whole mm and b) return it straight to the page allocator.


Well we have the 'fullmm' case, which avoids all the locked pte operations
(for those architectures where hardware pt walking requires atomicity).

However we still have to visit those to-be-unmapped parts of the page table,
to find the pages and free them. So we still at least need to bring it into
cache for the read... at which point, the store probably isn't a big burden.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Andrew Morton
> On Tue, 13 Mar 2007 22:30:19 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote:
> We don't actually have to zap_pte_range the entire page table in
> order to free it (IIRC we used to have to, before the 4lpt patches).

I'm trying to remember why we ever would have needed to zero out the pagetable
pages if we're taking down the whole mm?  Maybe it's because "oh, the
arch wants to put this page into a quicklist to recycle it", which is
all rather circular.

It would be interesting to look at a) leave the page full of random garbage
if we're releasing the whole mm and b) return it straight to the page allocator.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Andrew Morton
> On Tue, 13 Mar 2007 04:20:48 -0700 (PDT) Christoph Lameter <[EMAIL 
> PROTECTED]> wrote:
> On Tue, 13 Mar 2007, Andrew Morton wrote:
> 
> > Yeah, prezeroing in idle is probably pointless.  But I'm not aware of
> > anyone having tried it properly...
> 
> Ok, then what did I do wrong 3 years ago with the prezeroing patchsets?

Failed to provide us a link to it?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Nick Piggin

Andrew Morton wrote:

On Tue, 13 Mar 2007 22:06:46 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote:
Andrew Morton wrote:


On Tue, 13 Mar 2007 19:03:38 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote:


...



Page allocator still requires interrupts to be disabled, which this doesn't.




it is worthwhile.



If you want a zeroed page for pagecache and someone has just stuffed a
known-zero, cache-hot page into the pagetable quicklists, you have good
reason to be upset.


The thing is, pagetable pages are the one really good exception to the
rule that we should keep cache hot and initialise-on-demand. They
typically are fairly sparsely populated and sparsely accessed. Even
for last level page tables, I think it is reasonable to assume they will
usually be pretty cold.



eh?  I'd have thought that a pte page which has just gone through
zap_pte_range() will very often have a _lot_ of hot cachelines, and
that's a common case.

Still.   It's pretty easy to test.


Well I guess that would be the case if you had just unmapped a 4MB
chunk that was pretty dense with pages.

My malloc seems to allocate and free in blocks of 128K, so that's
only going to give us 3% of the last level pte being cache hot when
it gets freed. Not sure what common mmap(file) access patterns
look like.

The majority of programs I run have a smattering of llpt pages
pretty sparsely populated, covering text, libraries, heap, stack,
vdso.

We don't actually have to zap_pte_range the entire page table in
order to free it (IIRC we used to have to, before the 4lpt patches).

But yeah let's see some tests. I would definitely want to avoid this
extra layer of complexity if it is just as good to return the pages
to the pcp lists.


Maybe, dunno.  It was apparently a win on powerpc many years ago.  I had a
fiddle with it 5-6 years ago on x86 using a cache-disabled mapping of the
page.  But it needed too much support in core VM to bother.  Since then
we've grown per-cpu page magazines and __GFP_ZERO.  Plus I'm not aware of
anyone having tried doing it on x86 with non-temporal stores.


You can win on specifically constructed benchmarks, easily.

But considering all the other problems you're going to introduce, we'd need
a significant win on a significant something, IMO.

You waste memory bandwidth. You also use more CPU and memory cycles
speculatively, ergo you waste more power.



Yeah, prezeroing in idle is probably pointless.  But I'm not aware of
anyone having tried it properly...


--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Andrew Morton
> On Tue, 13 Mar 2007 04:17:26 -0700 (PDT) Christoph Lameter <[EMAIL 
> PROTECTED]> wrote:
> On Tue, 13 Mar 2007, Andrew Morton wrote:
> 
> > > On Tue, 13 Mar 2007 00:13:25 -0700 (PDT) Christoph Lameter <[EMAIL 
> > > PROTECTED]> wrote:
> > > Page table pages have the characteristics that they are typically zero
> > > or in a known state when they are freed.
> > 
> > Well if they're zero then perhaps they should be released to the page 
> > allocator to satisfy the next __GFP_ZERO request.  If that request is 
> > for a pagetable page, we break even (except we get to remove 
> > special-case code).  If that __GFP_ZERO allocation was or some 
> > application other than for a pagetable, we win.
> 
> Nope that wont work.
> 
> 1. We need to support other states of pages other than zeroed.

What does this mean?

> 2. Prezeroing does not make much sense if a large portion of the
>page is being used. Performance is better if the whole page 
>is zeroed directly before use.Prezeroing only makes sense for sparse
>allocations like the page table pages.

This is not related to the above discussion.

> > (Will require some work in the page allocator)
> > (That work will open the path to using the idle thread to prezero pages)
> 
> I already tried that 3 years ago and there was *no* benefit for usual
> users of the a page allocator. The advantage exists only if a small
> portion of the page is used. F.e. For one cacheline there was a 4x 
> improvement. See lkml archives for prezeroing.

Unsurprised.  Were non-temporal stores tried?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Christoph Lameter
On Tue, 13 Mar 2007, Andrew Morton wrote:

> Yeah, prezeroing in idle is probably pointless.  But I'm not aware of
> anyone having tried it properly...

Ok, then what did I do wrong 3 years ago with the prezeroing patchsets?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Christoph Lameter
On Tue, 13 Mar 2007, Andrew Morton wrote:

> > On Tue, 13 Mar 2007 00:13:25 -0700 (PDT) Christoph Lameter <[EMAIL 
> > PROTECTED]> wrote:
> > Page table pages have the characteristics that they are typically zero
> > or in a known state when they are freed.
> 
> Well if they're zero then perhaps they should be released to the page 
> allocator to satisfy the next __GFP_ZERO request.  If that request is 
> for a pagetable page, we break even (except we get to remove 
> special-case code).  If that __GFP_ZERO allocation was or some 
> application other than for a pagetable, we win.

Nope that wont work.

1. We need to support other states of pages other than zeroed.

2. Prezeroing does not make much sense if a large portion of the
   page is being used. Performance is better if the whole page 
   is zeroed directly before use.Prezeroing only makes sense for sparse
   allocations like the page table pages.

> (Will require some work in the page allocator)
> (That work will open the path to using the idle thread to prezero pages)

I already tried that 3 years ago and there was *no* benefit for usual
users of the a page allocator. The advantage exists only if a small
portion of the page is used. F.e. For one cacheline there was a 4x 
improvement. See lkml archives for prezeroing.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Andrew Morton
> On Tue, 13 Mar 2007 22:06:46 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote:
> Andrew Morton wrote:
> >>On Tue, 13 Mar 2007 19:03:38 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> ...
>
> >>Page allocator still requires interrupts to be disabled, which this doesn't.

> >>it is worthwhile.
> > 
> > 
> > If you want a zeroed page for pagecache and someone has just stuffed a
> > known-zero, cache-hot page into the pagetable quicklists, you have good
> > reason to be upset.
> 
> The thing is, pagetable pages are the one really good exception to the
> rule that we should keep cache hot and initialise-on-demand. They
> typically are fairly sparsely populated and sparsely accessed. Even
> for last level page tables, I think it is reasonable to assume they will
> usually be pretty cold.

eh?  I'd have thought that a pte page which has just gone through
zap_pte_range() will very often have a _lot_ of hot cachelines, and
that's a common case.

Still.   It's pretty easy to test.

> > 
> > Maybe, dunno.  It was apparently a win on powerpc many years ago.  I had a
> > fiddle with it 5-6 years ago on x86 using a cache-disabled mapping of the
> > page.  But it needed too much support in core VM to bother.  Since then
> > we've grown per-cpu page magazines and __GFP_ZERO.  Plus I'm not aware of
> > anyone having tried doing it on x86 with non-temporal stores.
> 
> You can win on specifically constructed benchmarks, easily.
> 
> But considering all the other problems you're going to introduce, we'd need
> a significant win on a significant something, IMO.
> 
> You waste memory bandwidth. You also use more CPU and memory cycles
> speculatively, ergo you waste more power.

Yeah, prezeroing in idle is probably pointless.  But I'm not aware of
anyone having tried it properly...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Nick Piggin

Andrew Morton wrote:

On Tue, 13 Mar 2007 19:03:38 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote:



Page allocator still requires interrupts to be disabled, which this doesn't.



Bah.  How many cli/sti statements fit into a single cachemiss?


On a Pentium 4? ;)

Sure, that is a minor detail, considering that you'll usually be allocating
an order of magnitude or three more anon/pagecache pages than page tables.


Considering there isn't much else that frees known zeroed pages, I wonder if
it is worthwhile.



If you want a zeroed page for pagecache and someone has just stuffed a
known-zero, cache-hot page into the pagetable quicklists, you have good
reason to be upset.


The thing is, pagetable pages are the one really good exception to the
rule that we should keep cache hot and initialise-on-demand. They
typically are fairly sparsely populated and sparsely accessed. Even
for last level page tables, I think it is reasonable to assume they will
usually be pretty cold.

And you want to allocate cache cold pages as well, for the same reasons
(you want to keep your cache hot pages for when they actually will be
used - eg. for the anon/pagecache itself).


In fact, if you want a _non_-zeroed page and someone has just stuffed a
known-zero, cache-hot page into the pagetable quicklists, you still have
reason to be upset.  You *want* that cache-hot page.

Generally, all these little private lists of pages (such as the ones which
slab had/has) are a bad deal.  Cache effects preponderate and I do think
we're generally better off tossing the things into a central pool.


For slab I understand. And a lot of users of slab constructers were also
silly, precisely because we should initialise on demand to keep the cache
hits up.

But cold(ish?) pagetable quicklists make sense, IMO (that is, if you *must*
avoid using slab).


Last time the zeroidle discussion came up was IIRC not actually real performance
gain, just cooking the 1024 CPU threaded pagefault numbers ;)



Maybe, dunno.  It was apparently a win on powerpc many years ago.  I had a
fiddle with it 5-6 years ago on x86 using a cache-disabled mapping of the
page.  But it needed too much support in core VM to bother.  Since then
we've grown per-cpu page magazines and __GFP_ZERO.  Plus I'm not aware of
anyone having tried doing it on x86 with non-temporal stores.


You can win on specifically constructed benchmarks, easily.

But considering all the other problems you're going to introduce, we'd need
a significant win on a significant something, IMO.

You waste memory bandwidth. You also use more CPU and memory cycles
speculatively, ergo you waste more power.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Andrew Morton
> On Tue, 13 Mar 2007 19:03:38 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote:
> Andrew Morton wrote:
> >>On Tue, 13 Mar 2007 00:13:25 -0700 (PDT) Christoph Lameter <[EMAIL 
> >>PROTECTED]> wrote:
> >>Page table pages have the characteristics that they are typically zero
> >>or in a known state when they are freed.
> > 
> > 
> > Well if they're zero then perhaps they should be released to the page 
> > allocator
> > to satisfy the next __GFP_ZERO request.  If that request is for a pagetable
> > page, we break even (except we get to remove special-case code).  If that
> > __GFP_ZERO allocation was or some application other than for a pagetable, we
> > win.
> > 
> > iow, can we just nuke 'em?
> 
> Page allocator still requires interrupts to be disabled, which this doesn't.

Bah.  How many cli/sti statements fit into a single cachemiss?

> Considering there isn't much else that frees known zeroed pages, I wonder if
> it is worthwhile.

If you want a zeroed page for pagecache and someone has just stuffed a
known-zero, cache-hot page into the pagetable quicklists, you have good
reason to be upset.

In fact, if you want a _non_-zeroed page and someone has just stuffed a
known-zero, cache-hot page into the pagetable quicklists, you still have
reason to be upset.  You *want* that cache-hot page.

Generally, all these little private lists of pages (such as the ones which
slab had/has) are a bad deal.  Cache effects preponderate and I do think
we're generally better off tossing the things into a central pool.

Plus, we can get in a situation where take a cache-cold, known-zero page
from the pte quicklist when there is a cache-hot, non-zero page sitting in
the page allocator.  I suspect that zeroing the cache-hot page would take a
similar amount of time to a single miss agains the cache-cold page.

I'm not saying that I _know_ that the quicklists are pointless, but I don't
think it's established that they are pointful.

ISTR that experiments with removing the i386 quicklists made zero
difference, but that was an awfully long time ago.  Significantly, it
predated per-cpu-pages..


> Last time the zeroidle discussion came up was IIRC not actually real 
> performance
> gain, just cooking the 1024 CPU threaded pagefault numbers ;)

Maybe, dunno.  It was apparently a win on powerpc many years ago.  I had a
fiddle with it 5-6 years ago on x86 using a cache-disabled mapping of the
page.  But it needed too much support in core VM to bother.  Since then
we've grown per-cpu page magazines and __GFP_ZERO.  Plus I'm not aware of
anyone having tried doing it on x86 with non-temporal stores.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Nick Piggin

Andrew Morton wrote:

On Tue, 13 Mar 2007 00:13:25 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> 
wrote:
Page table pages have the characteristics that they are typically zero
or in a known state when they are freed.



Well if they're zero then perhaps they should be released to the page allocator
to satisfy the next __GFP_ZERO request.  If that request is for a pagetable
page, we break even (except we get to remove special-case code).  If that
__GFP_ZERO allocation was or some application other than for a pagetable, we
win.

iow, can we just nuke 'em?


Page allocator still requires interrupts to be disabled, which this doesn't.

Considering there isn't much else that frees known zeroed pages, I wonder if
it is worthwhile.

Last time the zeroidle discussion came up was IIRC not actually real performance
gain, just cooking the 1024 CPU threaded pagefault numbers ;)

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Andrew Morton
> On Tue, 13 Mar 2007 00:13:25 -0700 (PDT) Christoph Lameter <[EMAIL 
> PROTECTED]> wrote:
> Page table pages have the characteristics that they are typically zero
> or in a known state when they are freed.

Well if they're zero then perhaps they should be released to the page allocator
to satisfy the next __GFP_ZERO request.  If that request is for a pagetable
page, we break even (except we get to remove special-case code).  If that
__GFP_ZERO allocation was or some application other than for a pagetable, we
win.

iow, can we just nuke 'em?

(Will require some work in the page allocator)
(That work will open the path to using the idle thread to prezero pages)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Andrew Morton
 On Tue, 13 Mar 2007 00:13:25 -0700 (PDT) Christoph Lameter [EMAIL 
 PROTECTED] wrote:
 Page table pages have the characteristics that they are typically zero
 or in a known state when they are freed.

Well if they're zero then perhaps they should be released to the page allocator
to satisfy the next __GFP_ZERO request.  If that request is for a pagetable
page, we break even (except we get to remove special-case code).  If that
__GFP_ZERO allocation was or some application other than for a pagetable, we
win.

iow, can we just nuke 'em?

(Will require some work in the page allocator)
(That work will open the path to using the idle thread to prezero pages)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Nick Piggin

Andrew Morton wrote:

On Tue, 13 Mar 2007 00:13:25 -0700 (PDT) Christoph Lameter [EMAIL PROTECTED] 
wrote:
Page table pages have the characteristics that they are typically zero
or in a known state when they are freed.



Well if they're zero then perhaps they should be released to the page allocator
to satisfy the next __GFP_ZERO request.  If that request is for a pagetable
page, we break even (except we get to remove special-case code).  If that
__GFP_ZERO allocation was or some application other than for a pagetable, we
win.

iow, can we just nuke 'em?


Page allocator still requires interrupts to be disabled, which this doesn't.

Considering there isn't much else that frees known zeroed pages, I wonder if
it is worthwhile.

Last time the zeroidle discussion came up was IIRC not actually real performance
gain, just cooking the 1024 CPU threaded pagefault numbers ;)

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Andrew Morton
 On Tue, 13 Mar 2007 19:03:38 +1100 Nick Piggin [EMAIL PROTECTED] wrote:
 Andrew Morton wrote:
 On Tue, 13 Mar 2007 00:13:25 -0700 (PDT) Christoph Lameter [EMAIL 
 PROTECTED] wrote:
 Page table pages have the characteristics that they are typically zero
 or in a known state when they are freed.
  
  
  Well if they're zero then perhaps they should be released to the page 
  allocator
  to satisfy the next __GFP_ZERO request.  If that request is for a pagetable
  page, we break even (except we get to remove special-case code).  If that
  __GFP_ZERO allocation was or some application other than for a pagetable, we
  win.
  
  iow, can we just nuke 'em?
 
 Page allocator still requires interrupts to be disabled, which this doesn't.

Bah.  How many cli/sti statements fit into a single cachemiss?

 Considering there isn't much else that frees known zeroed pages, I wonder if
 it is worthwhile.

If you want a zeroed page for pagecache and someone has just stuffed a
known-zero, cache-hot page into the pagetable quicklists, you have good
reason to be upset.

In fact, if you want a _non_-zeroed page and someone has just stuffed a
known-zero, cache-hot page into the pagetable quicklists, you still have
reason to be upset.  You *want* that cache-hot page.

Generally, all these little private lists of pages (such as the ones which
slab had/has) are a bad deal.  Cache effects preponderate and I do think
we're generally better off tossing the things into a central pool.

Plus, we can get in a situation where take a cache-cold, known-zero page
from the pte quicklist when there is a cache-hot, non-zero page sitting in
the page allocator.  I suspect that zeroing the cache-hot page would take a
similar amount of time to a single miss agains the cache-cold page.

I'm not saying that I _know_ that the quicklists are pointless, but I don't
think it's established that they are pointful.

ISTR that experiments with removing the i386 quicklists made zero
difference, but that was an awfully long time ago.  Significantly, it
predated per-cpu-pages..


 Last time the zeroidle discussion came up was IIRC not actually real 
 performance
 gain, just cooking the 1024 CPU threaded pagefault numbers ;)

Maybe, dunno.  It was apparently a win on powerpc many years ago.  I had a
fiddle with it 5-6 years ago on x86 using a cache-disabled mapping of the
page.  But it needed too much support in core VM to bother.  Since then
we've grown per-cpu page magazines and __GFP_ZERO.  Plus I'm not aware of
anyone having tried doing it on x86 with non-temporal stores.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Nick Piggin

Andrew Morton wrote:

On Tue, 13 Mar 2007 19:03:38 +1100 Nick Piggin [EMAIL PROTECTED] wrote:



Page allocator still requires interrupts to be disabled, which this doesn't.



Bah.  How many cli/sti statements fit into a single cachemiss?


On a Pentium 4? ;)

Sure, that is a minor detail, considering that you'll usually be allocating
an order of magnitude or three more anon/pagecache pages than page tables.


Considering there isn't much else that frees known zeroed pages, I wonder if
it is worthwhile.



If you want a zeroed page for pagecache and someone has just stuffed a
known-zero, cache-hot page into the pagetable quicklists, you have good
reason to be upset.


The thing is, pagetable pages are the one really good exception to the
rule that we should keep cache hot and initialise-on-demand. They
typically are fairly sparsely populated and sparsely accessed. Even
for last level page tables, I think it is reasonable to assume they will
usually be pretty cold.

And you want to allocate cache cold pages as well, for the same reasons
(you want to keep your cache hot pages for when they actually will be
used - eg. for the anon/pagecache itself).


In fact, if you want a _non_-zeroed page and someone has just stuffed a
known-zero, cache-hot page into the pagetable quicklists, you still have
reason to be upset.  You *want* that cache-hot page.

Generally, all these little private lists of pages (such as the ones which
slab had/has) are a bad deal.  Cache effects preponderate and I do think
we're generally better off tossing the things into a central pool.


For slab I understand. And a lot of users of slab constructers were also
silly, precisely because we should initialise on demand to keep the cache
hits up.

But cold(ish?) pagetable quicklists make sense, IMO (that is, if you *must*
avoid using slab).


Last time the zeroidle discussion came up was IIRC not actually real performance
gain, just cooking the 1024 CPU threaded pagefault numbers ;)



Maybe, dunno.  It was apparently a win on powerpc many years ago.  I had a
fiddle with it 5-6 years ago on x86 using a cache-disabled mapping of the
page.  But it needed too much support in core VM to bother.  Since then
we've grown per-cpu page magazines and __GFP_ZERO.  Plus I'm not aware of
anyone having tried doing it on x86 with non-temporal stores.


You can win on specifically constructed benchmarks, easily.

But considering all the other problems you're going to introduce, we'd need
a significant win on a significant something, IMO.

You waste memory bandwidth. You also use more CPU and memory cycles
speculatively, ergo you waste more power.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Andrew Morton
 On Tue, 13 Mar 2007 22:06:46 +1100 Nick Piggin [EMAIL PROTECTED] wrote:
 Andrew Morton wrote:
 On Tue, 13 Mar 2007 19:03:38 +1100 Nick Piggin [EMAIL PROTECTED] wrote:
 
 ...

 Page allocator still requires interrupts to be disabled, which this doesn't.

 it is worthwhile.
  
  
  If you want a zeroed page for pagecache and someone has just stuffed a
  known-zero, cache-hot page into the pagetable quicklists, you have good
  reason to be upset.
 
 The thing is, pagetable pages are the one really good exception to the
 rule that we should keep cache hot and initialise-on-demand. They
 typically are fairly sparsely populated and sparsely accessed. Even
 for last level page tables, I think it is reasonable to assume they will
 usually be pretty cold.

eh?  I'd have thought that a pte page which has just gone through
zap_pte_range() will very often have a _lot_ of hot cachelines, and
that's a common case.

Still.   It's pretty easy to test.

  
  Maybe, dunno.  It was apparently a win on powerpc many years ago.  I had a
  fiddle with it 5-6 years ago on x86 using a cache-disabled mapping of the
  page.  But it needed too much support in core VM to bother.  Since then
  we've grown per-cpu page magazines and __GFP_ZERO.  Plus I'm not aware of
  anyone having tried doing it on x86 with non-temporal stores.
 
 You can win on specifically constructed benchmarks, easily.
 
 But considering all the other problems you're going to introduce, we'd need
 a significant win on a significant something, IMO.
 
 You waste memory bandwidth. You also use more CPU and memory cycles
 speculatively, ergo you waste more power.

Yeah, prezeroing in idle is probably pointless.  But I'm not aware of
anyone having tried it properly...
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Christoph Lameter
On Tue, 13 Mar 2007, Andrew Morton wrote:

  On Tue, 13 Mar 2007 00:13:25 -0700 (PDT) Christoph Lameter [EMAIL 
  PROTECTED] wrote:
  Page table pages have the characteristics that they are typically zero
  or in a known state when they are freed.
 
 Well if they're zero then perhaps they should be released to the page 
 allocator to satisfy the next __GFP_ZERO request.  If that request is 
 for a pagetable page, we break even (except we get to remove 
 special-case code).  If that __GFP_ZERO allocation was or some 
 application other than for a pagetable, we win.

Nope that wont work.

1. We need to support other states of pages other than zeroed.

2. Prezeroing does not make much sense if a large portion of the
   page is being used. Performance is better if the whole page 
   is zeroed directly before use.Prezeroing only makes sense for sparse
   allocations like the page table pages.

 (Will require some work in the page allocator)
 (That work will open the path to using the idle thread to prezero pages)

I already tried that 3 years ago and there was *no* benefit for usual
users of the a page allocator. The advantage exists only if a small
portion of the page is used. F.e. For one cacheline there was a 4x 
improvement. See lkml archives for prezeroing.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Christoph Lameter
On Tue, 13 Mar 2007, Andrew Morton wrote:

 Yeah, prezeroing in idle is probably pointless.  But I'm not aware of
 anyone having tried it properly...

Ok, then what did I do wrong 3 years ago with the prezeroing patchsets?


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Andrew Morton
 On Tue, 13 Mar 2007 04:17:26 -0700 (PDT) Christoph Lameter [EMAIL 
 PROTECTED] wrote:
 On Tue, 13 Mar 2007, Andrew Morton wrote:
 
   On Tue, 13 Mar 2007 00:13:25 -0700 (PDT) Christoph Lameter [EMAIL 
   PROTECTED] wrote:
   Page table pages have the characteristics that they are typically zero
   or in a known state when they are freed.
  
  Well if they're zero then perhaps they should be released to the page 
  allocator to satisfy the next __GFP_ZERO request.  If that request is 
  for a pagetable page, we break even (except we get to remove 
  special-case code).  If that __GFP_ZERO allocation was or some 
  application other than for a pagetable, we win.
 
 Nope that wont work.
 
 1. We need to support other states of pages other than zeroed.

What does this mean?

 2. Prezeroing does not make much sense if a large portion of the
page is being used. Performance is better if the whole page 
is zeroed directly before use.Prezeroing only makes sense for sparse
allocations like the page table pages.

This is not related to the above discussion.

  (Will require some work in the page allocator)
  (That work will open the path to using the idle thread to prezero pages)
 
 I already tried that 3 years ago and there was *no* benefit for usual
 users of the a page allocator. The advantage exists only if a small
 portion of the page is used. F.e. For one cacheline there was a 4x 
 improvement. See lkml archives for prezeroing.

Unsurprised.  Were non-temporal stores tried?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Nick Piggin

Andrew Morton wrote:

On Tue, 13 Mar 2007 22:06:46 +1100 Nick Piggin [EMAIL PROTECTED] wrote:
Andrew Morton wrote:


On Tue, 13 Mar 2007 19:03:38 +1100 Nick Piggin [EMAIL PROTECTED] wrote:


...



Page allocator still requires interrupts to be disabled, which this doesn't.




it is worthwhile.



If you want a zeroed page for pagecache and someone has just stuffed a
known-zero, cache-hot page into the pagetable quicklists, you have good
reason to be upset.


The thing is, pagetable pages are the one really good exception to the
rule that we should keep cache hot and initialise-on-demand. They
typically are fairly sparsely populated and sparsely accessed. Even
for last level page tables, I think it is reasonable to assume they will
usually be pretty cold.



eh?  I'd have thought that a pte page which has just gone through
zap_pte_range() will very often have a _lot_ of hot cachelines, and
that's a common case.

Still.   It's pretty easy to test.


Well I guess that would be the case if you had just unmapped a 4MB
chunk that was pretty dense with pages.

My malloc seems to allocate and free in blocks of 128K, so that's
only going to give us 3% of the last level pte being cache hot when
it gets freed. Not sure what common mmap(file) access patterns
look like.

The majority of programs I run have a smattering of llpt pages
pretty sparsely populated, covering text, libraries, heap, stack,
vdso.

We don't actually have to zap_pte_range the entire page table in
order to free it (IIRC we used to have to, before the 4lpt patches).

But yeah let's see some tests. I would definitely want to avoid this
extra layer of complexity if it is just as good to return the pages
to the pcp lists.


Maybe, dunno.  It was apparently a win on powerpc many years ago.  I had a
fiddle with it 5-6 years ago on x86 using a cache-disabled mapping of the
page.  But it needed too much support in core VM to bother.  Since then
we've grown per-cpu page magazines and __GFP_ZERO.  Plus I'm not aware of
anyone having tried doing it on x86 with non-temporal stores.


You can win on specifically constructed benchmarks, easily.

But considering all the other problems you're going to introduce, we'd need
a significant win on a significant something, IMO.

You waste memory bandwidth. You also use more CPU and memory cycles
speculatively, ergo you waste more power.



Yeah, prezeroing in idle is probably pointless.  But I'm not aware of
anyone having tried it properly...


--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Andrew Morton
 On Tue, 13 Mar 2007 04:20:48 -0700 (PDT) Christoph Lameter [EMAIL 
 PROTECTED] wrote:
 On Tue, 13 Mar 2007, Andrew Morton wrote:
 
  Yeah, prezeroing in idle is probably pointless.  But I'm not aware of
  anyone having tried it properly...
 
 Ok, then what did I do wrong 3 years ago with the prezeroing patchsets?

Failed to provide us a link to it?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Andrew Morton
 On Tue, 13 Mar 2007 22:30:19 +1100 Nick Piggin [EMAIL PROTECTED] wrote:
 We don't actually have to zap_pte_range the entire page table in
 order to free it (IIRC we used to have to, before the 4lpt patches).

I'm trying to remember why we ever would have needed to zero out the pagetable
pages if we're taking down the whole mm?  Maybe it's because oh, the
arch wants to put this page into a quicklist to recycle it, which is
all rather circular.

It would be interesting to look at a) leave the page full of random garbage
if we're releasing the whole mm and b) return it straight to the page allocator.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Nick Piggin

Andrew Morton wrote:

On Tue, 13 Mar 2007 22:30:19 +1100 Nick Piggin [EMAIL PROTECTED] wrote:
We don't actually have to zap_pte_range the entire page table in
order to free it (IIRC we used to have to, before the 4lpt patches).



I'm trying to remember why we ever would have needed to zero out the pagetable
pages if we're taking down the whole mm?  Maybe it's because oh, the
arch wants to put this page into a quicklist to recycle it, which is
all rather circular.

It would be interesting to look at a) leave the page full of random garbage
if we're releasing the whole mm and b) return it straight to the page allocator.


Well we have the 'fullmm' case, which avoids all the locked pte operations
(for those architectures where hardware pt walking requires atomicity).

However we still have to visit those to-be-unmapped parts of the page table,
to find the pages and free them. So we still at least need to bring it into
cache for the read... at which point, the store probably isn't a big burden.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Andrew Morton
 On Tue, 13 Mar 2007 23:01:11 +1100 Nick Piggin [EMAIL PROTECTED] wrote:
 Andrew Morton wrote:
 On Tue, 13 Mar 2007 22:30:19 +1100 Nick Piggin [EMAIL PROTECTED] wrote:
 We don't actually have to zap_pte_range the entire page table in
 order to free it (IIRC we used to have to, before the 4lpt patches).
  
  
  I'm trying to remember why we ever would have needed to zero out the 
  pagetable
  pages if we're taking down the whole mm?  Maybe it's because oh, the
  arch wants to put this page into a quicklist to recycle it, which is
  all rather circular.
  
  It would be interesting to look at a) leave the page full of random garbage
  if we're releasing the whole mm and b) return it straight to the page 
  allocator.
 
 Well we have the 'fullmm' case, which avoids all the locked pte operations
 (for those architectures where hardware pt walking requires atomicity).

I suspect there are some tlb operations which could be skipped in that case
too.

 However we still have to visit those to-be-unmapped parts of the page table
 to find the pages and free them. So we still at least need to bring it into
 cache for the read... at which point, the store probably isn't a big burden.

It means all that data has to be written back.  Yes, I expect it'll prove
to be less costly than the initial load.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Nick Piggin

Andrew Morton wrote:

On Tue, 13 Mar 2007 23:01:11 +1100 Nick Piggin [EMAIL PROTECTED] wrote:
Andrew Morton wrote:



It would be interesting to look at a) leave the page full of random garbage
if we're releasing the whole mm and b) return it straight to the page allocator.


Well we have the 'fullmm' case, which avoids all the locked pte operations
(for those architectures where hardware pt walking requires atomicity).



I suspect there are some tlb operations which could be skipped in that case
too.


Depends on the tlb flush implementation. The generic one doesn't look like
it is all that smart about optimising the fullmm case. It does skip some
tlb flushing though.


However we still have to visit those to-be-unmapped parts of the page table
to find the pages and free them. So we still at least need to bring it into
cache for the read... at which point, the store probably isn't a big burden.



It means all that data has to be written back.  Yes, I expect it'll prove
to be less costly than the initial load.


Still, it is something we could try.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Jeremy Fitzhardinge
Nick Piggin wrote:
 However we still have to visit those to-be-unmapped parts of the page
 table,
 to find the pages and free them. So we still at least need to bring it
 into
 cache for the read... at which point, the store probably isn't a big
 burden.

Why not try to find a place to stash a linklist pointer and link them
all together?  Saves the pulldown pagetable walk altogether.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Matt Mackall
On Tue, Mar 13, 2007 at 10:30:10AM -0700, Jeremy Fitzhardinge wrote:
 Nick Piggin wrote:
  However we still have to visit those to-be-unmapped parts of the page
  table,
  to find the pages and free them. So we still at least need to bring it
  into
  cache for the read... at which point, the store probably isn't a big
  burden.
 
 Why not try to find a place to stash a linklist pointer and link them
 all together?  Saves the pulldown pagetable walk altogether.

Because we'd need one link per mm that a page is mapped in?

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Jeremy Fitzhardinge
Matt Mackall wrote:
 On Tue, Mar 13, 2007 at 10:30:10AM -0700, Jeremy Fitzhardinge wrote:
   
 Nick Piggin wrote:
 
 However we still have to visit those to-be-unmapped parts of the page
 table,
 to find the pages and free them. So we still at least need to bring it
 into
 cache for the read... at which point, the store probably isn't a big
 burden.
   
 Why not try to find a place to stash a linklist pointer and link them
 all together?  Saves the pulldown pagetable walk altogether.
 

 Because we'd need one link per mm that a page is mapped in?
   

Can pagetable pages be shared between mms?  (Kernel pmds in PAE excepted.)

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Matt Mackall
On Tue, Mar 13, 2007 at 01:17:00PM -0700, Jeremy Fitzhardinge wrote:
 Matt Mackall wrote:
  On Tue, Mar 13, 2007 at 10:30:10AM -0700, Jeremy Fitzhardinge wrote:

  Nick Piggin wrote:
  
  However we still have to visit those to-be-unmapped parts of the page
  table,
  to find the pages and free them. So we still at least need to bring it
  into
  cache for the read... at which point, the store probably isn't a big
  burden.

  Why not try to find a place to stash a linklist pointer and link them
  all together?  Saves the pulldown pagetable walk altogether.
  
 
  Because we'd need one link per mm that a page is mapped in?

 
 Can pagetable pages be shared between mms?  (Kernel pmds in PAE excepted.)

Ahh, I think the issue is that we have to walk the page tables to drop
the reference count of the _actual pages_ they point to. The page
tables themselves could all be put on a list or two lists (one for
PMDs, one for everything else), but that wouldn't really be a win over
just walking the tree, especially given the extra list maintenance.

Because the fan-out is large, the bulk of the work is bringing the last
layer of the tree into cache to find all the pages in the address
space. And there's really no way around that.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread David Miller
From: Matt Mackall [EMAIL PROTECTED]
Date: Tue, 13 Mar 2007 15:21:25 -0500

 Because the fan-out is large, the bulk of the work is bringing the last
 layer of the tree into cache to find all the pages in the address
 space. And there's really no way around that.

That's right.

And I will note that historically we used to be much worse
in this area, as we used to walk the page table tree twice
on address space teardown (once to hit the PTE entries, once
to free the page tables).

Happily it is a one-pass algorithm now.

But, within active VMA ranges, we do have to walk all
the bits at least one time.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Matt Mackall
On Tue, Mar 13, 2007 at 02:07:22PM -0700, David Miller wrote:
 From: Matt Mackall [EMAIL PROTECTED]
 Date: Tue, 13 Mar 2007 15:21:25 -0500
 
  Because the fan-out is large, the bulk of the work is bringing the last
  layer of the tree into cache to find all the pages in the address
  space. And there's really no way around that.
 
 That's right.
 
 And I will note that historically we used to be much worse
 in this area, as we used to walk the page table tree twice
 on address space teardown (once to hit the PTE entries, once
 to free the page tables).
 
 Happily it is a one-pass algorithm now.
 
 But, within active VMA ranges, we do have to walk all
 the bits at least one time.

Well you -could- do this:

- reuse a long in struct page as a used map that divides the page up
  into 32 or 64 segments
- every time you set a PTE, set the corresponding bit in the mask
- when we zap, only visit the regions set in the mask

Thus, you avoid visiting most of a PMD page in the sparse case,
assuming PTEs aren't evenly spread across the PMD.

This might not even be too horrible as the appropriate struct page
should be in cache with the appropriate bits of the mm already locked,
etc.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Jeremy Fitzhardinge
Matt Mackall wrote:
 Well you -could- do this:

 - reuse a long in struct page as a used map that divides the page up
   into 32 or 64 segments
 - every time you set a PTE, set the corresponding bit in the mask
 - when we zap, only visit the regions set in the mask

 Thus, you avoid visiting most of a PMD page in the sparse case,
 assuming PTEs aren't evenly spread across the PMD.

 This might not even be too horrible as the appropriate struct page
 should be in cache with the appropriate bits of the mm already locked,
 etc.
   

And do the same in pte pages for actual mapped pages?  Or do you think
they would be too densely populated for it to be worthwhile?

J

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread David Miller
From: Matt Mackall [EMAIL PROTECTED]
Date: Tue, 13 Mar 2007 16:14:35 -0500

 Well you -could- do this:
 
 - reuse a long in struct page as a used map that divides the page up
   into 32 or 64 segments
 - every time you set a PTE, set the corresponding bit in the mask
 - when we zap, only visit the regions set in the mask
 
 Thus, you avoid visiting most of a PMD page in the sparse case,
 assuming PTEs aren't evenly spread across the PMD.
 
 This might not even be too horrible as the appropriate struct page
 should be in cache with the appropriate bits of the mm already locked,
 etc.

Yes, I've even had that idea before.

You can even hide it behind pmd_none() et al., the generic VM
doesn't even have to know that the page table macros are doing
this optimization.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Peter Chubb
 Jeremy == Jeremy Fitzhardinge [EMAIL PROTECTED] writes:


Jeremy And do the same in pte pages for actual mapped pages?  Or do
Jeremy you think they would be too densely populated for it to be
Jeremy worthwhile?

We've been doing some measurements on how densely clumped ptes are.
On 32-bit platforms, they're pretty dense.  On IA64, quite a bit
sparser, depending on the workload of course.  I think that's mostly because
of the larger pagesize on IA64 -- with 64k pages, you don't need very
many to map a small object.

I'm hoping IanW can give more details.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Paul Mackerras
Andrew Morton writes:

 Plus, we can get in a situation where take a cache-cold, known-zero page
 from the pte quicklist when there is a cache-hot, non-zero page sitting in
 the page allocator.  I suspect that zeroing the cache-hot page would take a
 similar amount of time to a single miss agains the cache-cold page.

That is certainly the case on powerpc.

 I'm not saying that I _know_ that the quicklists are pointless, but I don't
 think it's established that they are pointful.

I don't see much point to them.  For powerpc, I would rather grab an
arbitrary page and zero it than get a page off a quicklist.

 Maybe, dunno.  It was apparently a win on powerpc many years ago.  I had a

My recollection was that it wasn't a win, but it was a long time ago...

Paul.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread William Lee Irwin III
On Tue, Mar 13, 2007 at 04:47:56AM -0800, Andrew Morton wrote:
 I'm trying to remember why we ever would have needed to zero out the
 pagetable pages if we're taking down the whole mm?  Maybe it's
 because oh, the arch wants to put this page into a quicklist to
 recycle it, which is all rather circular.
 It would be interesting to look at a) leave the page full of random
 garbage if we're releasing the whole mm and b) return it straight to
 the page allocator.

We never did need to modify ptes on exit() or other pagetable prunings
(not that they were ever done outside exit() before 2.6.x). The only
subtlety is that pruning on munmap() needs a TLB flush for the TLB
itself to drop the references to the pages referred to by the PTE's on
pruning in the presence of hardware pagetable walkers (in the exit()
case there are no user execution contexts left to potentially utilize
the dead translations so it's less important). That's handled by
tlb_remove_page() and shouldn't need any updates across such a change.

I believe the zeroing on teardown was largely a result of idiom vs.
any particular need. Essentially using ptep_get_and_clear() to handle
the non-pruning munmap() case in a manner unified with other pagetable
teardowns. Also likely is 2.4.x legacy from when that and possibly
earlier kernels maintained arch-private quicklists for pagetables.

There are furthermore distinctions to make between fork() and execve().
fork() stomps over the entire process address space copying pagetables
en masse. After execve() a process incrementally faults in PTE's one at
a time. It should be clear that if case analyses are of interest at
all, fork() will want cache-hot pages (cache-preloaded pages?) where
such are largely wasted on incremental faults after execve(). The copy
operations in fork() should probably also be examined in the context of
shared pagetables at some point.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/