Re: [QUICKLIST 1/5] Quicklists for page table pages V4

2007-03-27 Thread William Lee Irwin III
On Mon, Mar 26, 2007 at 10:26:51AM -0800, Andrew Morton wrote:
> b) we understand why the below simple modification crashes i386.

This doesn't crash i386 in qemu here on a port of the quicklist patches
to 2.6.21-rc5-mm2. I suppose I'll have to dump it on some real hardware
to see if I can reproduce it there.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 1/5] Quicklists for page table pages V4

2007-03-26 Thread David Miller
From: William Lee Irwin III <[EMAIL PROTECTED]>
Date: Mon, 26 Mar 2007 18:06:24 -0700

> On Mon, Mar 26, 2007 at 10:26:51AM -0800, Andrew Morton wrote:
> > b) we understand why the below simple modification crashes i386.
> 
> Full eager zeroing patches not dependent on quicklist code don't crash,
> so there is no latent use-after-free issue covered up by caching. I'll
> help out more on the i386 front as-needed.

I've looked into this a few times and I am quite mystified as
to why that simple test patch crashes.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 1/5] Quicklists for page table pages V4

2007-03-26 Thread Christoph Lameter
On Mon, 26 Mar 2007, William Lee Irwin III wrote:

> Not that clameter really needs my help, but I agree with his position
> on several fronts, and advocate accordingly, so here is where I'm at.

Yes thank you. I386 is not my field, I have no interest per se in 
improving i386 performance and without your help I would have to drop this 
and keep the special casing in SLUB for i386. Generic tlb.h changes may 
also help to introduce quicklists to x86_64. The current quicklist patches 
can only work on higher levels due to the freeing of ptes via 
tlb_remove_page().

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 1/5] Quicklists for page table pages V4

2007-03-26 Thread William Lee Irwin III
On Mon, Mar 26, 2007 at 10:26:51AM -0800, Andrew Morton wrote:
> a) it has been demonstrated that this patch is superior to simply removing
>the quicklists and

Not that clameter really needs my help, but I agree with his position
on several fronts, and advocate accordingly, so here is where I'm at.

>From prior experience, I believe I know how to extract positive results,
and that's primarily by PTE caching because they're the most frequently
zeroed pagetable nodes. The upper levels of pagetables will remain in
the noise until the leaf level bottleneck is dealt with.

PTE's need a custom tlb.h to deal with the TLB issues noted above; the
asm-generic variant will not suffice. Results above the noise level
need PTE caching. Sparse fault handling (esp. after execve() is done)
is one place in particular where improvements should be most readily
demonstrable, as only single cachelines on each allocated node should
be touched. lmbench should have a fault handling latency test for this.


On Mon, Mar 26, 2007 at 10:26:51AM -0800, Andrew Morton wrote:
> b) we understand why the below simple modification crashes i386.

Full eager zeroing patches not dependent on quicklist code don't crash,
so there is no latent use-after-free issue covered up by caching. I'll
help out more on the i386 front as-needed.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 1/5] Quicklists for page table pages V4

2007-03-26 Thread Andrew Morton
On Mon, 26 Mar 2007 09:52:17 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> 
wrote:

> On Fri, 23 Mar 2007, Andrew Morton wrote:
> 
> > On Fri, 23 Mar 2007 10:54:12 -0700 (PDT) Christoph Lameter <[EMAIL 
> > PROTECTED]> wrote:
> > 
> > > Here are the results of aim9 tests on x86_64. There are some minor 
> > > performance 
> > > improvements and some fluctuations.
> > 
> > There are a lot of numbers there - what do they tell us?
> 
> That there are performance improvements because of quicklists.

Christoph, you can continue to be obtuse, and I can continue to ignore
these patches until

a) it has been demonstrated that this patch is superior to simply removing
   the quicklists and

b) we understand why the below simple modification crashes i386.


diff -puN include/linux/quicklist.h~qlhack include/linux/quicklist.h
--- a/include/linux/quicklist.h~qlhack
+++ a/include/linux/quicklist.h
@@ -32,45 +32,17 @@ DECLARE_PER_CPU(struct quicklist, quickl
  */
 static inline void *quicklist_alloc(int nr, gfp_t flags, void (*ctor)(void *))
 {
-   struct quicklist *q;
-   void **p = NULL;
-
-   q =&get_cpu_var(quicklist)[nr];
-   p = q->page;
-   if (likely(p)) {
-   q->page = p[0];
-   p[0] = NULL;
-   q->nr_pages--;
-   }
-   put_cpu_var(quicklist);
-   if (likely(p))
-   return p;
-
-   p = (void *)__get_free_page(flags | __GFP_ZERO);
+   void *p = (void *)__get_free_page(flags | __GFP_ZERO);
if (ctor && p)
ctor(p);
return p;
 }
 
-static inline void quicklist_free(int nr, void (*dtor)(void *), void *pp)
+static inline void quicklist_free(int nr, void (*dtor)(void *), void *p)
 {
-   struct quicklist *q;
-   void **p = pp;
-   struct page *page = virt_to_page(p);
-   int nid = page_to_nid(page);
-
-   if (unlikely(nid != numa_node_id())) {
-   if (dtor)
-   dtor(p);
-   free_page((unsigned long)p);
-   return;
-   }
-
-   q = &get_cpu_var(quicklist)[nr];
-   p[0] = q->page;
-   q->page = p;
-   q->nr_pages++;
-   put_cpu_var(quicklist);
+   if (dtor)
+   dtor(p);
+   free_page((unsigned long)p);
 }
 
 void quicklist_trim(int nr, void (*dtor)(void *),
@@ -81,4 +53,3 @@ unsigned long quicklist_total_size(void)
 #endif
 
 #endif /* LINUX_QUICKLIST_H */
-
_


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 1/5] Quicklists for page table pages V4

2007-03-26 Thread Christoph Lameter
On Mon, 26 Mar 2007, Christoph Lameter wrote:

> > After your patches, x86_64 is using a common quicklist allocator for puds,
> > pmds and pgds and continues to use get_zeroed_page() for ptes.
> 
> x86_64 should be using quicklists for all ptes after this patch. I did not 
> convert pte_free() since it is only used for freeing ptes during races 
> (see __pte_alloc). Since pte_free gets passed a page struct it would require 
> virt_to_page before being put onto the freelist. Not worth doing.
> 
> Hmmm... Then how does x86_64 free the ptes? Seems that we do 
> free_page_and_swap_cache() in tlb_remove_pages. Yup so ptes are not 
> handled which limits the speed improvements that we see.

And if we would try to put the ptes onto quicklists then we would get into 
more difficulties with the tlb shootdown code. Sigh. We cannot easily 
deal with ptes. Quicklists on i386 and x86_64 only work for pgds,puds and 
pmds. And as was pointed out elsewhere in this thread: The performance 
gains are therefore limited on these platforms.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 1/5] Quicklists for page table pages V4

2007-03-26 Thread Christoph Lameter
On Fri, 23 Mar 2007, Andrew Morton wrote:

> On Fri, 23 Mar 2007 10:54:12 -0700 (PDT) Christoph Lameter <[EMAIL 
> PROTECTED]> wrote:
> 
> > Here are the results of aim9 tests on x86_64. There are some minor 
> > performance 
> > improvements and some fluctuations.
> 
> There are a lot of numbers there - what do they tell us?

That there are performance improvements because of quicklists.

> So what has changed here?  From a quick look it appears that x86_64 is
> using get_zeroed_page() for ptes, puds and pmds and is using a custom
> quicklist for pgds.

x86_64 is only using a list in order to track pgds. There is no 
quicklist without this patchset.
 
> After your patches, x86_64 is using a common quicklist allocator for puds,
> pmds and pgds and continues to use get_zeroed_page() for ptes.

x86_64 should be using quicklists for all ptes after this patch. I did not 
convert pte_free() since it is only used for freeing ptes during races 
(see __pte_alloc). Since pte_free gets passed a page struct it would require 
virt_to_page before being put onto the freelist. Not worth doing.

Hmmm... Then how does x86_64 free the ptes? Seems that we do 
free_page_and_swap_cache() in tlb_remove_pages. Yup so ptes are not 
handled which limits the speed improvements that we see.

> My question is pretty simple: how do we justify the retention of this
> custom allocator?

I would expect this functionality (never thought about it as an allocator) 
to extract common code from many arches that use one or the other form of 
preserving zeroed pages for page table pages. I saw lots of arches doing 
the same with some getting into trouble with the page structs. Having a 
common code base that does not have this issue would clean up the kernel 
and deal with the slab issue.

> Because simply removing it is the preferable way of fixing the SLUB
> problem.

That would reduce performance. I did not think that a common feature 
that is used throughout many arches would need rejustification.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 1/5] Quicklists for page table pages V4

2007-03-23 Thread Andrew Morton
On Fri, 23 Mar 2007 10:54:12 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> 
wrote:

> Here are the results of aim9 tests on x86_64. There are some minor 
> performance 
> improvements and some fluctuations.

There are a lot of numbers there - what do they tell us?

> 2.6.21-rc4 bare
> 2.6.21-rc4 x86_64 quicklist

So what has changed here?  From a quick look it appears that x86_64 is
using get_zeroed_page() for ptes, puds and pmds and is using a custom
quicklist for pgds.

After your patches, x86_64 is using a common quicklist allocator for puds,
pmds and pgds and continues to use get_zeroed_page() for ptes.

Or something totally different, dunno.  I tire.


My question is pretty simple: how do we justify the retention of this
custom allocator?

Because simply removing it is the preferable way of fixing the SLUB
problem.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 1/5] Quicklists for page table pages V4

2007-03-23 Thread Andrew Morton
On Fri, 23 Mar 2007 22:39:24 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> 
> > but it crashes early in the page allocator (i386) and I don't see why.  It
> > makes me wonder if we have a use-after-free which is hidden by the presence
> > of the quicklist buffering or something.
> 
> Does CONFIG_DEBUG_PAGEALLOC catch it?

It'll be a while before I can get onto doing anything with this.
I do have an oops trace:


kjournald starting.  Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 296k freed
Write protecting the kernel read-only data: 921k
BUG: unable to handle kernel paging request at virtual address 00100104
 printing eip:
c015b676
*pde = 
Oops: 0002 [#1]
SMP 
Modules linked in:
CPU:1
EIP:0060:[]Not tainted VLI
EFLAGS: 00010002   (2.6.21-rc4 #6)
EIP is at get_page_from_freelist+0x166/0x3d0
eax: c1b110bc   ebx: 0001   ecx: 00100100   edx: 00200200
esi: c1b11090   edi: c04cc500   ebp: f67d3b88   esp: f67d3b34
ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
Process default.hotplug (pid: 872, ti=f67d2000 task=f6748030 task.ti=f67d2000)
Stack: 0001 0044 c067eae8 0001 0001  c04cc6c0 c04cc4a0 
   0001  000284d0 c04ccb78 0286 0001  f67b6000 
    0001 c04cc4a0 f6748030 84d0 f67d3bcc c015b92e 0044 
Call Trace:
 [] show_trace_log_lvl+0x1a/0x30
 [] show_stack_log_lvl+0xa9/0xd0
 [] show_registers+0x1e9/0x2f0
 [] die+0x115/0x250
 [] do_page_fault+0x27e/0x630
 [] error_code+0x7c/0x84
 [] __alloc_pages+0x4e/0x2f0
 [] pte_alloc_one+0x14/0x20
 [] __pte_alloc+0x1b/0xa0
 [] __handle_mm_fault+0x7fd/0x940
 [] do_page_fault+0x119/0x630
 [] error_code+0x7c/0x84
 [] padzero+0x1f/0x30
 [] load_elf_binary+0x76e/0x1a80
 [] search_binary_handler+0x97/0x220
 [] load_script+0x1d6/0x220
 [] search_binary_handler+0x97/0x220
 [] do_execve+0x14f/0x200
 [] sys_execve+0x2e/0x80
 [] sysenter_past_esp+0x5d/0x99
 ===
Code: 06 8b 4d c0 8b 7d c8 8d 04 81 8d 44 82 20 01 c7 9c 8f 45 dc fa e8 4b f4 
fd ff 8b 07 85 c0 74 7b 8b 47 0c 8b 08 8d 70 d4 8b 50 04 <89> 51 04 89 0a c7 40 
04 00 02 20 00 c7 00 00 01 10 00 ff 0f 8b 
EIP: [] get_page_from_freelist+0x166/0x3d0 SS:ESP 0068:f67d3b34

Not pretty.  That was bare mainline+christoph's patches+that patch which I sent.
Using http://userweb.kernel.org/~akpm/config-vmm.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 1/5] Quicklists for page table pages V4

2007-03-23 Thread William Lee Irwin III
On Thu, Mar 22, 2007 at 11:48:48PM -0800, Andrew Morton wrote:
>>> afacit that two-year-old, totally-different patch has nothing to do with my
>>> repeatedly-asked question.  It appears to be consolidating three separate
>>> quicklist allocators into one common implementation.
>>> In an attempt to answer my own question (and hence to justify the retention
>>> of this custom allocator) I did this:
>> [... patch changing allocator alloc()/free() to bare page allocations ...]
>>> but it crashes early in the page allocator (i386) and I don't see why.  It
>>> makes me wonder if we have a use-after-free which is hidden by the presence
>>> of the quicklist buffering or something.

On Fri, Mar 23, 2007 at 04:29:20AM -0700, William Lee Irwin III wrote:
>> Sorry I flubbed the first message. Anyway this does mean something is
>> seriously wrong and needs to be debugged. Looking into it now.

On Fri, Mar 23, 2007 at 07:57:07AM -0700, William Lee Irwin III wrote:
> I know what's happening. I just need to catch the culprit.

Are you tripping the BUG_ON() in include/linux/mm.h:256 with
CONFIG_DEBUG_VM set?


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 1/5] Quicklists for page table pages V4

2007-03-23 Thread Christoph Lameter
Here are the results of aim9 tests on x86_64. There are some minor performance 
improvements and some fluctuations. Page size is only a fourth of that on 
ia64 so the resulting benefit is less in terms of saved cacheline fetches.

The benefit is also likely higher on i386 because it can fit double the 
page table entries into a page.

 1 add_double   1096039.60 1096039.60   0.00  0.00% Thousand Double 
Precision Additions/second
 2 add_float1087128.71 1099009.90   11881.19  1.09% Thousand Single 
Precision Additions/second
 3 add_long 4019704.43 4374384.24  354679.81  8.82% Thousand Long Integer 
Additions/second
 4 add_int  3772277.23 3772277.23   0.00  0.00% Thousand Integer 
Additions/second
 5 add_short3754455.45 3761194.036738.58  0.18% Thousand Short Integer 
Additions/second
 6 creat-clo259405.94 267164.18  7758.24  2.99% File Creations and 
Closes/second
 7 page_test233118.81 235970.15  2851.34  1.22% System Allocations & 
Pages/second
 8 brk_test 3425247.52 3408457.71  -16789.81 -0.49% System Memory 
Allocations/second
 9 jmp_test 21819306.93 21808457.71 -10849.22 -0.05% Non-local gotos/second
10 signal_test  669154.23 689552.24 20398.01  3.05% Signal Traps/second
11 exec_test747.52 743.78  -3.74 -0.50% Program Loads/second
12 fork_test8267.33 8457.71   190.38  2.30% Task Creations/second
13 link_test43819.31 44318.32 499.01  1.14% Link/Unlink Pairs/second
28 fun_cal  326463366.34 326559203.98 95837.64 0.03% Function Calls (no 
arguments)/second
29 fun_cal1 358906930.69 388202985.07 29296054.38 8.16% Function Calls (1 
argument)/second
30 fun_cal2 356372277.23 356362189.05 -10088.18 -0.00% Function Calls (2 
arguments)/second
31 fun_cal15156641584.16 156656716.42 15132.26  0.01% Function Calls (15 
arguments)/second
45 mem_rtns_2   1588762.38 1610298.51   21536.13  1.36% Block Memory 
Operations/second
46 sort_rtns_1  935.32 1004.98 69.66  7.45% Sort Operations/second
47 misc_rtns_1  17099.01 17268.66 169.65  0.99% Auxiliary Loops/second
48 dir_rtns_1   5925742.57 6313432.84  387690.27  6.54% Directory 
Operations/second
52 series_1 11469950.50 11625771.14 155820.64 1.36% Series 
Evaluations/second
53 shared_memory 1187313.43 1177910.45  -9402.98 -0.79% Shared Memory 
Operations/second
54 tcp_test 83183.17 83507.46 324.29  0.39% TCP/IP Messages/second
55 udp_test 273514.85 269801.00 -3713.85 -1.36% UDP/IP DataGrams/second
56 fifo_test741237.62 803930.35 62692.73  8.46% FIFO Messages/second
57 stream_pipe  885099.01 1058059.70   172960.69 19.54% Stream Pipe 
Messages/second
58 dgram_pipe   881782.18 957213.93 75431.75  8.55% DataGram Pipe 
Messages/second
59 pipe_cpy 1355891.09 1316766.17  -39124.92 -2.89% Pipe Messages/second


2.6.21-rc4 bare


 TestTestElapsed  IterationIteration  Operation
Number   Name  Time (sec)   Count   Rate (loops/sec)Rate (ops/sec)

 1 add_double   2.02123   60.89109  1096039.60 Thousand 
Double Precision Additions/second
 2 add_float2.02183   90.59406  1087128.71 Thousand 
Single Precision Additions/second
 3 add_long 2.03136   66.99507  4019704.43 Thousand 
Long Integer Additions/second
 4 add_int  2.02127   62.87129  3772277.23 Thousand 
Integer Additions/second
 5 add_short2.02316  156.43564  3754455.45 Thousand 
Short Integer Additions/second
 6 creat-clo2.02524  259.40594   259405.94 File 
Creations and Closes/second
 7 page_test2.02277  137.12871   233118.81 System 
Allocations & Pages/second
 8 brk_test 2.02407  201.48515  3425247.52 System 
Memory Allocations/second
 9 jmp_test 2.02  44075 21819.30693 21819306.93 
Non-local gotos/second
10 signal_test  2.01   1345  669.15423   669154.23 Signal 
Traps/second
11 exec_test2.02302  149.50495  747.52 Program 
Loads/second
12 fork_test2.02167   82.67327 8267.33 Task 
Creations/second
13 link_test2.02   1405  695.5445543819.31 
Link/Unlink Pairs/second
14 disk_rr  2.02 65   32.17822   164752.48 Random 
Disk Reads (K)/second
15 disk_rw  2.03 55   27.09360   138719.21 Random 
Disk Writes (K)/second
16 disk_rd  2.02467  231.18812  1183683.17 
Sequential Disk Reads (K)/second
17 disk_wrt 2.02 81   40.09901   205306.93 
Sequential Disk Writes (K

Re: [QUICKLIST 1/5] Quicklists for page table pages V4

2007-03-23 Thread Christoph Lameter
On Thu, 22 Mar 2007, Andrew Morton wrote:

> > About 40% on fork+exit. See 
> > 
> > http://marc.info/?l=linux-ia64&m=110942798406005&w=2
> > 
> 
> afacit that two-year-old, totally-different patch has nothing to do with my
> repeatedly-asked question.  It appears to be consolidating three separate
> quicklist allocators into one common implementation.

Yes it shows the performance gains from the quicklist approach. This the 
work Robin Holt did on the problem. The problem is how to validate the 
patch because there should be no change at all on ia64 and on i386 we 
basically measure the overhead of the slab allocations. One could 
measure the impact x86_64 because this introduces quicklists to that 
platform.

The earlier discussion focused on avoiding zeroing of pte as far as I can 
recall.
 
> but it crashes early in the page allocator (i386) and I don't see why.  It
> makes me wonder if we have a use-after-free which is hidden by the presence
> of the quicklist buffering or something.

This was on i386? Could be hidden now by the slab use ther.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 1/5] Quicklists for page table pages V4

2007-03-23 Thread Christoph Lameter
On Fri, 23 Mar 2007, William Lee Irwin III wrote:

> [... patch changing allocator alloc()/free() to bare page allocations ...]
> > but it crashes early in the page allocator (i386) and I don't see why.  It
> > makes me wonder if we have a use-after-free which is hidden by the presence
> > of the quicklist buffering or something.

Sorry there seems to be some email dropouts today. I am getting 
fragments of slab and quicklist discussions. Maybe I can get the whole story 
from 
the mailing lists.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 1/5] Quicklists for page table pages V4

2007-03-23 Thread William Lee Irwin III
On Thu, Mar 22, 2007 at 11:48:48PM -0800, Andrew Morton wrote:
>> afacit that two-year-old, totally-different patch has nothing to do with my
>> repeatedly-asked question.  It appears to be consolidating three separate
>> quicklist allocators into one common implementation.
>> In an attempt to answer my own question (and hence to justify the retention
>> of this custom allocator) I did this:
> [... patch changing allocator alloc()/free() to bare page allocations ...]
>> but it crashes early in the page allocator (i386) and I don't see why.  It
>> makes me wonder if we have a use-after-free which is hidden by the presence
>> of the quicklist buffering or something.

On Fri, Mar 23, 2007 at 04:29:20AM -0700, William Lee Irwin III wrote:
> Sorry I flubbed the first message. Anyway this does mean something is
> seriously wrong and needs to be debugged. Looking into it now.

I know what's happening. I just need to catch the culprit.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 1/5] Quicklists for page table pages V4

2007-03-23 Thread Nick Piggin

Andrew Morton wrote:


but it crashes early in the page allocator (i386) and I don't see why.  It
makes me wonder if we have a use-after-free which is hidden by the presence
of the quicklist buffering or something.


Does CONFIG_DEBUG_PAGEALLOC catch it?

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 1/5] Quicklists for page table pages V4

2007-03-23 Thread William Lee Irwin III
On Thu, Mar 22, 2007 at 11:48:48PM -0800, Andrew Morton wrote:
> afacit that two-year-old, totally-different patch has nothing to do with my
> repeatedly-asked question.  It appears to be consolidating three separate
> quicklist allocators into one common implementation.
> In an attempt to answer my own question (and hence to justify the retention
> of this custom allocator) I did this:
[... patch changing allocator alloc()/free() to bare page allocations ...]
> but it crashes early in the page allocator (i386) and I don't see why.  It
> makes me wonder if we have a use-after-free which is hidden by the presence
> of the quicklist buffering or something.

Sorry I flubbed the first message. Anyway this does mean something is
seriously wrong and needs to be debugged. Looking into it now.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 1/5] Quicklists for page table pages V4

2007-03-23 Thread William Lee Irwin III
On Thu, Mar 22, 2007 at 11:48:48PM -0800, Andrew Morton wrote:
> afacit that two-year-old, totally-different patch has nothing to do with my
> repeatedly-asked question.  It appears to be consolidating three separate
> quicklist allocators into one common implementation.
> In an attempt to answer my own question (and hence to justify the retention
> of this custom allocator) I did this:
[... patch changing allocator alloc()/free() to bare page allocations ...]
> but it crashes early in the page allocator (i386) and I don't see why.  It
> makes me wonder if we have a use-after-free which is hidden by the presence
> of the quicklist buffering or something.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 1/5] Quicklists for page table pages V4

2007-03-22 Thread Andrew Morton
On Thu, 22 Mar 2007 23:52:05 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> 
wrote:

> On Thu, 22 Mar 2007, Andrew Morton wrote:
> 
> > On Thu, 22 Mar 2007 23:28:41 -0700 (PDT) Christoph Lameter <[EMAIL 
> > PROTECTED]> wrote:
> > 
> > > 1. Proven code from the IA64 arch.
> > > 
> > >   The method used here has been fine tuned for years and
> > >   is NUMA aware. It is based on the knowledge that accesses
> > >   to page table pages are sparse in nature. Taking a page
> > >   off the freelists instead of allocating a zeroed pages
> > >   allows a reduction of number of cachelines touched
> > >   in addition to getting rid of the slab overhead. So
> > >   performance improves.
> > 
> > By how much?
> 
> About 40% on fork+exit. See 
> 
> http://marc.info/?l=linux-ia64&m=110942798406005&w=2
> 

afacit that two-year-old, totally-different patch has nothing to do with my
repeatedly-asked question.  It appears to be consolidating three separate
quicklist allocators into one common implementation.

In an attempt to answer my own question (and hence to justify the retention
of this custom allocator) I did this:


diff -puN include/linux/quicklist.h~qlhack include/linux/quicklist.h
--- a/include/linux/quicklist.h~qlhack
+++ a/include/linux/quicklist.h
@@ -32,45 +32,17 @@ DECLARE_PER_CPU(struct quicklist, quickl
  */
 static inline void *quicklist_alloc(int nr, gfp_t flags, void (*ctor)(void *))
 {
-   struct quicklist *q;
-   void **p = NULL;
-
-   q =&get_cpu_var(quicklist)[nr];
-   p = q->page;
-   if (likely(p)) {
-   q->page = p[0];
-   p[0] = NULL;
-   q->nr_pages--;
-   }
-   put_cpu_var(quicklist);
-   if (likely(p))
-   return p;
-
-   p = (void *)__get_free_page(flags | __GFP_ZERO);
+   void *p = (void *)__get_free_page(flags | __GFP_ZERO);
if (ctor && p)
ctor(p);
return p;
 }
 
-static inline void quicklist_free(int nr, void (*dtor)(void *), void *pp)
+static inline void quicklist_free(int nr, void (*dtor)(void *), void *p)
 {
-   struct quicklist *q;
-   void **p = pp;
-   struct page *page = virt_to_page(p);
-   int nid = page_to_nid(page);
-
-   if (unlikely(nid != numa_node_id())) {
-   if (dtor)
-   dtor(p);
-   free_page((unsigned long)p);
-   return;
-   }
-
-   q = &get_cpu_var(quicklist)[nr];
-   p[0] = q->page;
-   q->page = p;
-   q->nr_pages++;
-   put_cpu_var(quicklist);
+   if (dtor)
+   dtor(p);
+   free_page((unsigned long)p);
 }
 
 void quicklist_trim(int nr, void (*dtor)(void *),
@@ -81,4 +53,3 @@ unsigned long quicklist_total_size(void)
 #endif
 
 #endif /* LINUX_QUICKLIST_H */
-
_

but it crashes early in the page allocator (i386) and I don't see why.  It
makes me wonder if we have a use-after-free which is hidden by the presence
of the quicklist buffering or something.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 1/5] Quicklists for page table pages V4

2007-03-22 Thread Christoph Lameter
On Thu, 22 Mar 2007, Andrew Morton wrote:

> On Thu, 22 Mar 2007 23:28:41 -0700 (PDT) Christoph Lameter <[EMAIL 
> PROTECTED]> wrote:
> 
> > 1. Proven code from the IA64 arch.
> > 
> > The method used here has been fine tuned for years and
> > is NUMA aware. It is based on the knowledge that accesses
> > to page table pages are sparse in nature. Taking a page
> > off the freelists instead of allocating a zeroed pages
> > allows a reduction of number of cachelines touched
> > in addition to getting rid of the slab overhead. So
> > performance improves.
> 
> By how much?

About 40% on fork+exit. See 

http://marc.info/?l=linux-ia64&m=110942798406005&w=2

 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 1/5] Quicklists for page table pages V4

2007-03-22 Thread Andrew Morton
On Thu, 22 Mar 2007 23:28:41 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> 
wrote:

> 1. Proven code from the IA64 arch.
> 
>   The method used here has been fine tuned for years and
>   is NUMA aware. It is based on the knowledge that accesses
>   to page table pages are sparse in nature. Taking a page
>   off the freelists instead of allocating a zeroed pages
>   allows a reduction of number of cachelines touched
>   in addition to getting rid of the slab overhead. So
>   performance improves.

By how much?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[QUICKLIST 1/5] Quicklists for page table pages V4

2007-03-22 Thread Christoph Lameter
Quicklists for page table pages V4

V3->V4
- Rename quicklist_check to quicklist_trim and allow parameters
  to specify how to clean quicklists.
- Remove dead code

V2->V3
- Fix Kconfig issues by setting CONFIG_QUICKLIST explicitly
  and default to one quicklist if NR_QUICK is not set.
- Fix i386 support. (Cannot mix PMD and PTE allocs.)
- Discussion of V2.
  http://marc.info/?l=linux-kernel&m=117391339914767&w=2

V1->V2
- Add sparch64 patch
- Single i386 and x86_64 patch
- Update attribution
- Update justification
- Update approvals
- Earlier discussion of V1 was at
  http://marc.info/?l=linux-kernel&m=117357922219342&w=2

This patchset introduces an arch independent framework to handle lists
of recently used page table pages to replace the existing (ab)use of the
slab for that purpose.

1. Proven code from the IA64 arch.

The method used here has been fine tuned for years and
is NUMA aware. It is based on the knowledge that accesses
to page table pages are sparse in nature. Taking a page
off the freelists instead of allocating a zeroed pages
allows a reduction of number of cachelines touched
in addition to getting rid of the slab overhead. So
performance improves. This is particularly useful if pgds
contain standard mappings. We can save on the teardown
and setup of such a page if we have some on the quicklists.
This includes avoiding lists operations that are otherwise
necessary on alloc and free to track pgds.

2. Light weight alternative to use slab to manage page size pages

Slab overhead is significant and even page allocator use
is pretty heavy weight. The use of a per cpu quicklist
means that we touch only two cachelines for an allocation.
There is no need to access the page_struct (unless arch code
needs to fiddle around with it). So the fast past just
means bringing in one cacheline at the beginning of the
page. That same cacheline may then be used to store the
page table entry. Or a second cacheline may be used
if the page table entry is not in the first cacheline of
the page. The current code will zero the page which means
touching 32 cachelines (assuming 128 byte). We get down
from 32 to 2 cachelines in the fast path.

3. Fix conflicting use of page_structs by slab and arch code.

F.e. Both arches use the ->private and ->index field to
create lists of pgds and i386 also uses other page flags. The slab
can also use the ->private field for allocations that
are larger than page size which would occur if one enables
debugging. In that case the arch code would overwrite the
pointer to the first page of the compound page allocated
by the slab. SLAB has been modified to not enable
debugging for such slabs (!).

There the potential for additional conflicts
here especially since some arches also use page flags to mark
page table pages.

The patch removes these conflicts by no longer using
the slab for these purposes. The page allocator is more
suitable since PAGE_SIZE chunks are its domain.
Then we can start using standard list operations via
page->lru instead of improvising linked lists.

SLUB makes more extensive use of the page struct and so
far had to create workarounds for these slabs. The ->index
field is used for the SLUB freelist. So SLUB cannot allow
the use of a freelist for these slabs and--like slab--
currently does not allow debugging and forces slabs to
only contain a single object (avoids freelist).

If we do not get rid of these issues then both SLAB and SLUB
have to continue to provide special code paths to support these
slabs.

4. i386 gets lightweight NUMA aware management of page table pages.

Note that the use of SLAB on NUMA systems will require the
use of alien caches to efficiently remove remote page
table pages. Which (for a PAGE_SIZEd allocation) is a lengthy
and expensive process. With quicklists no alien caches are
needed. Pages can be simply returned to the correct node.

5. x86_64 gets lightweight page table page management.

This will allow x86_64 arch code to faster repopulate pgds
and other page table entries. The list operations for pgds
are reduced in the same way as for i386 to the point where
a pgd is allocated from the page allocator and when it is
freed back to the page allocator. A pgd can pass through
the quicklists without having to be reinitialized.

6. Consolidation of code from multiple arches

So far arches have their own implementation of quicklist
management. This patch moves that feature into the core allowing
an easier maintenance and consistent management of quickl