Re: [QUICKLIST 1/5] Quicklists for page table pages V4
On Mon, Mar 26, 2007 at 10:26:51AM -0800, Andrew Morton wrote: > b) we understand why the below simple modification crashes i386. This doesn't crash i386 in qemu here on a port of the quicklist patches to 2.6.21-rc5-mm2. I suppose I'll have to dump it on some real hardware to see if I can reproduce it there. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V4
From: William Lee Irwin III <[EMAIL PROTECTED]> Date: Mon, 26 Mar 2007 18:06:24 -0700 > On Mon, Mar 26, 2007 at 10:26:51AM -0800, Andrew Morton wrote: > > b) we understand why the below simple modification crashes i386. > > Full eager zeroing patches not dependent on quicklist code don't crash, > so there is no latent use-after-free issue covered up by caching. I'll > help out more on the i386 front as-needed. I've looked into this a few times and I am quite mystified as to why that simple test patch crashes. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V4
On Mon, 26 Mar 2007, William Lee Irwin III wrote: > Not that clameter really needs my help, but I agree with his position > on several fronts, and advocate accordingly, so here is where I'm at. Yes thank you. I386 is not my field, I have no interest per se in improving i386 performance and without your help I would have to drop this and keep the special casing in SLUB for i386. Generic tlb.h changes may also help to introduce quicklists to x86_64. The current quicklist patches can only work on higher levels due to the freeing of ptes via tlb_remove_page(). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V4
On Mon, Mar 26, 2007 at 10:26:51AM -0800, Andrew Morton wrote: > a) it has been demonstrated that this patch is superior to simply removing >the quicklists and Not that clameter really needs my help, but I agree with his position on several fronts, and advocate accordingly, so here is where I'm at. >From prior experience, I believe I know how to extract positive results, and that's primarily by PTE caching because they're the most frequently zeroed pagetable nodes. The upper levels of pagetables will remain in the noise until the leaf level bottleneck is dealt with. PTE's need a custom tlb.h to deal with the TLB issues noted above; the asm-generic variant will not suffice. Results above the noise level need PTE caching. Sparse fault handling (esp. after execve() is done) is one place in particular where improvements should be most readily demonstrable, as only single cachelines on each allocated node should be touched. lmbench should have a fault handling latency test for this. On Mon, Mar 26, 2007 at 10:26:51AM -0800, Andrew Morton wrote: > b) we understand why the below simple modification crashes i386. Full eager zeroing patches not dependent on quicklist code don't crash, so there is no latent use-after-free issue covered up by caching. I'll help out more on the i386 front as-needed. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V4
On Mon, 26 Mar 2007 09:52:17 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Fri, 23 Mar 2007, Andrew Morton wrote: > > > On Fri, 23 Mar 2007 10:54:12 -0700 (PDT) Christoph Lameter <[EMAIL > > PROTECTED]> wrote: > > > > > Here are the results of aim9 tests on x86_64. There are some minor > > > performance > > > improvements and some fluctuations. > > > > There are a lot of numbers there - what do they tell us? > > That there are performance improvements because of quicklists. Christoph, you can continue to be obtuse, and I can continue to ignore these patches until a) it has been demonstrated that this patch is superior to simply removing the quicklists and b) we understand why the below simple modification crashes i386. diff -puN include/linux/quicklist.h~qlhack include/linux/quicklist.h --- a/include/linux/quicklist.h~qlhack +++ a/include/linux/quicklist.h @@ -32,45 +32,17 @@ DECLARE_PER_CPU(struct quicklist, quickl */ static inline void *quicklist_alloc(int nr, gfp_t flags, void (*ctor)(void *)) { - struct quicklist *q; - void **p = NULL; - - q =&get_cpu_var(quicklist)[nr]; - p = q->page; - if (likely(p)) { - q->page = p[0]; - p[0] = NULL; - q->nr_pages--; - } - put_cpu_var(quicklist); - if (likely(p)) - return p; - - p = (void *)__get_free_page(flags | __GFP_ZERO); + void *p = (void *)__get_free_page(flags | __GFP_ZERO); if (ctor && p) ctor(p); return p; } -static inline void quicklist_free(int nr, void (*dtor)(void *), void *pp) +static inline void quicklist_free(int nr, void (*dtor)(void *), void *p) { - struct quicklist *q; - void **p = pp; - struct page *page = virt_to_page(p); - int nid = page_to_nid(page); - - if (unlikely(nid != numa_node_id())) { - if (dtor) - dtor(p); - free_page((unsigned long)p); - return; - } - - q = &get_cpu_var(quicklist)[nr]; - p[0] = q->page; - q->page = p; - q->nr_pages++; - put_cpu_var(quicklist); + if (dtor) + dtor(p); + free_page((unsigned long)p); } void quicklist_trim(int nr, void (*dtor)(void *), @@ -81,4 +53,3 @@ unsigned long quicklist_total_size(void) #endif #endif /* LINUX_QUICKLIST_H */ - _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V4
On Mon, 26 Mar 2007, Christoph Lameter wrote: > > After your patches, x86_64 is using a common quicklist allocator for puds, > > pmds and pgds and continues to use get_zeroed_page() for ptes. > > x86_64 should be using quicklists for all ptes after this patch. I did not > convert pte_free() since it is only used for freeing ptes during races > (see __pte_alloc). Since pte_free gets passed a page struct it would require > virt_to_page before being put onto the freelist. Not worth doing. > > Hmmm... Then how does x86_64 free the ptes? Seems that we do > free_page_and_swap_cache() in tlb_remove_pages. Yup so ptes are not > handled which limits the speed improvements that we see. And if we would try to put the ptes onto quicklists then we would get into more difficulties with the tlb shootdown code. Sigh. We cannot easily deal with ptes. Quicklists on i386 and x86_64 only work for pgds,puds and pmds. And as was pointed out elsewhere in this thread: The performance gains are therefore limited on these platforms. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V4
On Fri, 23 Mar 2007, Andrew Morton wrote: > On Fri, 23 Mar 2007 10:54:12 -0700 (PDT) Christoph Lameter <[EMAIL > PROTECTED]> wrote: > > > Here are the results of aim9 tests on x86_64. There are some minor > > performance > > improvements and some fluctuations. > > There are a lot of numbers there - what do they tell us? That there are performance improvements because of quicklists. > So what has changed here? From a quick look it appears that x86_64 is > using get_zeroed_page() for ptes, puds and pmds and is using a custom > quicklist for pgds. x86_64 is only using a list in order to track pgds. There is no quicklist without this patchset. > After your patches, x86_64 is using a common quicklist allocator for puds, > pmds and pgds and continues to use get_zeroed_page() for ptes. x86_64 should be using quicklists for all ptes after this patch. I did not convert pte_free() since it is only used for freeing ptes during races (see __pte_alloc). Since pte_free gets passed a page struct it would require virt_to_page before being put onto the freelist. Not worth doing. Hmmm... Then how does x86_64 free the ptes? Seems that we do free_page_and_swap_cache() in tlb_remove_pages. Yup so ptes are not handled which limits the speed improvements that we see. > My question is pretty simple: how do we justify the retention of this > custom allocator? I would expect this functionality (never thought about it as an allocator) to extract common code from many arches that use one or the other form of preserving zeroed pages for page table pages. I saw lots of arches doing the same with some getting into trouble with the page structs. Having a common code base that does not have this issue would clean up the kernel and deal with the slab issue. > Because simply removing it is the preferable way of fixing the SLUB > problem. That would reduce performance. I did not think that a common feature that is used throughout many arches would need rejustification. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V4
On Fri, 23 Mar 2007 10:54:12 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: > Here are the results of aim9 tests on x86_64. There are some minor > performance > improvements and some fluctuations. There are a lot of numbers there - what do they tell us? > 2.6.21-rc4 bare > 2.6.21-rc4 x86_64 quicklist So what has changed here? From a quick look it appears that x86_64 is using get_zeroed_page() for ptes, puds and pmds and is using a custom quicklist for pgds. After your patches, x86_64 is using a common quicklist allocator for puds, pmds and pgds and continues to use get_zeroed_page() for ptes. Or something totally different, dunno. I tire. My question is pretty simple: how do we justify the retention of this custom allocator? Because simply removing it is the preferable way of fixing the SLUB problem. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V4
On Fri, 23 Mar 2007 22:39:24 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > > but it crashes early in the page allocator (i386) and I don't see why. It > > makes me wonder if we have a use-after-free which is hidden by the presence > > of the quicklist buffering or something. > > Does CONFIG_DEBUG_PAGEALLOC catch it? It'll be a while before I can get onto doing anything with this. I do have an oops trace: kjournald starting. Commit interval 5 seconds EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. VFS: Mounted root (ext3 filesystem) readonly. Freeing unused kernel memory: 296k freed Write protecting the kernel read-only data: 921k BUG: unable to handle kernel paging request at virtual address 00100104 printing eip: c015b676 *pde = Oops: 0002 [#1] SMP Modules linked in: CPU:1 EIP:0060:[]Not tainted VLI EFLAGS: 00010002 (2.6.21-rc4 #6) EIP is at get_page_from_freelist+0x166/0x3d0 eax: c1b110bc ebx: 0001 ecx: 00100100 edx: 00200200 esi: c1b11090 edi: c04cc500 ebp: f67d3b88 esp: f67d3b34 ds: 007b es: 007b fs: 00d8 gs: ss: 0068 Process default.hotplug (pid: 872, ti=f67d2000 task=f6748030 task.ti=f67d2000) Stack: 0001 0044 c067eae8 0001 0001 c04cc6c0 c04cc4a0 0001 000284d0 c04ccb78 0286 0001 f67b6000 0001 c04cc4a0 f6748030 84d0 f67d3bcc c015b92e 0044 Call Trace: [] show_trace_log_lvl+0x1a/0x30 [] show_stack_log_lvl+0xa9/0xd0 [] show_registers+0x1e9/0x2f0 [] die+0x115/0x250 [] do_page_fault+0x27e/0x630 [] error_code+0x7c/0x84 [] __alloc_pages+0x4e/0x2f0 [] pte_alloc_one+0x14/0x20 [] __pte_alloc+0x1b/0xa0 [] __handle_mm_fault+0x7fd/0x940 [] do_page_fault+0x119/0x630 [] error_code+0x7c/0x84 [] padzero+0x1f/0x30 [] load_elf_binary+0x76e/0x1a80 [] search_binary_handler+0x97/0x220 [] load_script+0x1d6/0x220 [] search_binary_handler+0x97/0x220 [] do_execve+0x14f/0x200 [] sys_execve+0x2e/0x80 [] sysenter_past_esp+0x5d/0x99 === Code: 06 8b 4d c0 8b 7d c8 8d 04 81 8d 44 82 20 01 c7 9c 8f 45 dc fa e8 4b f4 fd ff 8b 07 85 c0 74 7b 8b 47 0c 8b 08 8d 70 d4 8b 50 04 <89> 51 04 89 0a c7 40 04 00 02 20 00 c7 00 00 01 10 00 ff 0f 8b EIP: [] get_page_from_freelist+0x166/0x3d0 SS:ESP 0068:f67d3b34 Not pretty. That was bare mainline+christoph's patches+that patch which I sent. Using http://userweb.kernel.org/~akpm/config-vmm.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V4
On Thu, Mar 22, 2007 at 11:48:48PM -0800, Andrew Morton wrote: >>> afacit that two-year-old, totally-different patch has nothing to do with my >>> repeatedly-asked question. It appears to be consolidating three separate >>> quicklist allocators into one common implementation. >>> In an attempt to answer my own question (and hence to justify the retention >>> of this custom allocator) I did this: >> [... patch changing allocator alloc()/free() to bare page allocations ...] >>> but it crashes early in the page allocator (i386) and I don't see why. It >>> makes me wonder if we have a use-after-free which is hidden by the presence >>> of the quicklist buffering or something. On Fri, Mar 23, 2007 at 04:29:20AM -0700, William Lee Irwin III wrote: >> Sorry I flubbed the first message. Anyway this does mean something is >> seriously wrong and needs to be debugged. Looking into it now. On Fri, Mar 23, 2007 at 07:57:07AM -0700, William Lee Irwin III wrote: > I know what's happening. I just need to catch the culprit. Are you tripping the BUG_ON() in include/linux/mm.h:256 with CONFIG_DEBUG_VM set? -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V4
Here are the results of aim9 tests on x86_64. There are some minor performance improvements and some fluctuations. Page size is only a fourth of that on ia64 so the resulting benefit is less in terms of saved cacheline fetches. The benefit is also likely higher on i386 because it can fit double the page table entries into a page. 1 add_double 1096039.60 1096039.60 0.00 0.00% Thousand Double Precision Additions/second 2 add_float1087128.71 1099009.90 11881.19 1.09% Thousand Single Precision Additions/second 3 add_long 4019704.43 4374384.24 354679.81 8.82% Thousand Long Integer Additions/second 4 add_int 3772277.23 3772277.23 0.00 0.00% Thousand Integer Additions/second 5 add_short3754455.45 3761194.036738.58 0.18% Thousand Short Integer Additions/second 6 creat-clo259405.94 267164.18 7758.24 2.99% File Creations and Closes/second 7 page_test233118.81 235970.15 2851.34 1.22% System Allocations & Pages/second 8 brk_test 3425247.52 3408457.71 -16789.81 -0.49% System Memory Allocations/second 9 jmp_test 21819306.93 21808457.71 -10849.22 -0.05% Non-local gotos/second 10 signal_test 669154.23 689552.24 20398.01 3.05% Signal Traps/second 11 exec_test747.52 743.78 -3.74 -0.50% Program Loads/second 12 fork_test8267.33 8457.71 190.38 2.30% Task Creations/second 13 link_test43819.31 44318.32 499.01 1.14% Link/Unlink Pairs/second 28 fun_cal 326463366.34 326559203.98 95837.64 0.03% Function Calls (no arguments)/second 29 fun_cal1 358906930.69 388202985.07 29296054.38 8.16% Function Calls (1 argument)/second 30 fun_cal2 356372277.23 356362189.05 -10088.18 -0.00% Function Calls (2 arguments)/second 31 fun_cal15156641584.16 156656716.42 15132.26 0.01% Function Calls (15 arguments)/second 45 mem_rtns_2 1588762.38 1610298.51 21536.13 1.36% Block Memory Operations/second 46 sort_rtns_1 935.32 1004.98 69.66 7.45% Sort Operations/second 47 misc_rtns_1 17099.01 17268.66 169.65 0.99% Auxiliary Loops/second 48 dir_rtns_1 5925742.57 6313432.84 387690.27 6.54% Directory Operations/second 52 series_1 11469950.50 11625771.14 155820.64 1.36% Series Evaluations/second 53 shared_memory 1187313.43 1177910.45 -9402.98 -0.79% Shared Memory Operations/second 54 tcp_test 83183.17 83507.46 324.29 0.39% TCP/IP Messages/second 55 udp_test 273514.85 269801.00 -3713.85 -1.36% UDP/IP DataGrams/second 56 fifo_test741237.62 803930.35 62692.73 8.46% FIFO Messages/second 57 stream_pipe 885099.01 1058059.70 172960.69 19.54% Stream Pipe Messages/second 58 dgram_pipe 881782.18 957213.93 75431.75 8.55% DataGram Pipe Messages/second 59 pipe_cpy 1355891.09 1316766.17 -39124.92 -2.89% Pipe Messages/second 2.6.21-rc4 bare TestTestElapsed IterationIteration Operation Number Name Time (sec) Count Rate (loops/sec)Rate (ops/sec) 1 add_double 2.02123 60.89109 1096039.60 Thousand Double Precision Additions/second 2 add_float2.02183 90.59406 1087128.71 Thousand Single Precision Additions/second 3 add_long 2.03136 66.99507 4019704.43 Thousand Long Integer Additions/second 4 add_int 2.02127 62.87129 3772277.23 Thousand Integer Additions/second 5 add_short2.02316 156.43564 3754455.45 Thousand Short Integer Additions/second 6 creat-clo2.02524 259.40594 259405.94 File Creations and Closes/second 7 page_test2.02277 137.12871 233118.81 System Allocations & Pages/second 8 brk_test 2.02407 201.48515 3425247.52 System Memory Allocations/second 9 jmp_test 2.02 44075 21819.30693 21819306.93 Non-local gotos/second 10 signal_test 2.01 1345 669.15423 669154.23 Signal Traps/second 11 exec_test2.02302 149.50495 747.52 Program Loads/second 12 fork_test2.02167 82.67327 8267.33 Task Creations/second 13 link_test2.02 1405 695.5445543819.31 Link/Unlink Pairs/second 14 disk_rr 2.02 65 32.17822 164752.48 Random Disk Reads (K)/second 15 disk_rw 2.03 55 27.09360 138719.21 Random Disk Writes (K)/second 16 disk_rd 2.02467 231.18812 1183683.17 Sequential Disk Reads (K)/second 17 disk_wrt 2.02 81 40.09901 205306.93 Sequential Disk Writes (K
Re: [QUICKLIST 1/5] Quicklists for page table pages V4
On Thu, 22 Mar 2007, Andrew Morton wrote: > > About 40% on fork+exit. See > > > > http://marc.info/?l=linux-ia64&m=110942798406005&w=2 > > > > afacit that two-year-old, totally-different patch has nothing to do with my > repeatedly-asked question. It appears to be consolidating three separate > quicklist allocators into one common implementation. Yes it shows the performance gains from the quicklist approach. This the work Robin Holt did on the problem. The problem is how to validate the patch because there should be no change at all on ia64 and on i386 we basically measure the overhead of the slab allocations. One could measure the impact x86_64 because this introduces quicklists to that platform. The earlier discussion focused on avoiding zeroing of pte as far as I can recall. > but it crashes early in the page allocator (i386) and I don't see why. It > makes me wonder if we have a use-after-free which is hidden by the presence > of the quicklist buffering or something. This was on i386? Could be hidden now by the slab use ther. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V4
On Fri, 23 Mar 2007, William Lee Irwin III wrote: > [... patch changing allocator alloc()/free() to bare page allocations ...] > > but it crashes early in the page allocator (i386) and I don't see why. It > > makes me wonder if we have a use-after-free which is hidden by the presence > > of the quicklist buffering or something. Sorry there seems to be some email dropouts today. I am getting fragments of slab and quicklist discussions. Maybe I can get the whole story from the mailing lists. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V4
On Thu, Mar 22, 2007 at 11:48:48PM -0800, Andrew Morton wrote: >> afacit that two-year-old, totally-different patch has nothing to do with my >> repeatedly-asked question. It appears to be consolidating three separate >> quicklist allocators into one common implementation. >> In an attempt to answer my own question (and hence to justify the retention >> of this custom allocator) I did this: > [... patch changing allocator alloc()/free() to bare page allocations ...] >> but it crashes early in the page allocator (i386) and I don't see why. It >> makes me wonder if we have a use-after-free which is hidden by the presence >> of the quicklist buffering or something. On Fri, Mar 23, 2007 at 04:29:20AM -0700, William Lee Irwin III wrote: > Sorry I flubbed the first message. Anyway this does mean something is > seriously wrong and needs to be debugged. Looking into it now. I know what's happening. I just need to catch the culprit. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V4
Andrew Morton wrote: but it crashes early in the page allocator (i386) and I don't see why. It makes me wonder if we have a use-after-free which is hidden by the presence of the quicklist buffering or something. Does CONFIG_DEBUG_PAGEALLOC catch it? -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V4
On Thu, Mar 22, 2007 at 11:48:48PM -0800, Andrew Morton wrote: > afacit that two-year-old, totally-different patch has nothing to do with my > repeatedly-asked question. It appears to be consolidating three separate > quicklist allocators into one common implementation. > In an attempt to answer my own question (and hence to justify the retention > of this custom allocator) I did this: [... patch changing allocator alloc()/free() to bare page allocations ...] > but it crashes early in the page allocator (i386) and I don't see why. It > makes me wonder if we have a use-after-free which is hidden by the presence > of the quicklist buffering or something. Sorry I flubbed the first message. Anyway this does mean something is seriously wrong and needs to be debugged. Looking into it now. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V4
On Thu, Mar 22, 2007 at 11:48:48PM -0800, Andrew Morton wrote: > afacit that two-year-old, totally-different patch has nothing to do with my > repeatedly-asked question. It appears to be consolidating three separate > quicklist allocators into one common implementation. > In an attempt to answer my own question (and hence to justify the retention > of this custom allocator) I did this: [... patch changing allocator alloc()/free() to bare page allocations ...] > but it crashes early in the page allocator (i386) and I don't see why. It > makes me wonder if we have a use-after-free which is hidden by the presence > of the quicklist buffering or something. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V4
On Thu, 22 Mar 2007 23:52:05 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Thu, 22 Mar 2007, Andrew Morton wrote: > > > On Thu, 22 Mar 2007 23:28:41 -0700 (PDT) Christoph Lameter <[EMAIL > > PROTECTED]> wrote: > > > > > 1. Proven code from the IA64 arch. > > > > > > The method used here has been fine tuned for years and > > > is NUMA aware. It is based on the knowledge that accesses > > > to page table pages are sparse in nature. Taking a page > > > off the freelists instead of allocating a zeroed pages > > > allows a reduction of number of cachelines touched > > > in addition to getting rid of the slab overhead. So > > > performance improves. > > > > By how much? > > About 40% on fork+exit. See > > http://marc.info/?l=linux-ia64&m=110942798406005&w=2 > afacit that two-year-old, totally-different patch has nothing to do with my repeatedly-asked question. It appears to be consolidating three separate quicklist allocators into one common implementation. In an attempt to answer my own question (and hence to justify the retention of this custom allocator) I did this: diff -puN include/linux/quicklist.h~qlhack include/linux/quicklist.h --- a/include/linux/quicklist.h~qlhack +++ a/include/linux/quicklist.h @@ -32,45 +32,17 @@ DECLARE_PER_CPU(struct quicklist, quickl */ static inline void *quicklist_alloc(int nr, gfp_t flags, void (*ctor)(void *)) { - struct quicklist *q; - void **p = NULL; - - q =&get_cpu_var(quicklist)[nr]; - p = q->page; - if (likely(p)) { - q->page = p[0]; - p[0] = NULL; - q->nr_pages--; - } - put_cpu_var(quicklist); - if (likely(p)) - return p; - - p = (void *)__get_free_page(flags | __GFP_ZERO); + void *p = (void *)__get_free_page(flags | __GFP_ZERO); if (ctor && p) ctor(p); return p; } -static inline void quicklist_free(int nr, void (*dtor)(void *), void *pp) +static inline void quicklist_free(int nr, void (*dtor)(void *), void *p) { - struct quicklist *q; - void **p = pp; - struct page *page = virt_to_page(p); - int nid = page_to_nid(page); - - if (unlikely(nid != numa_node_id())) { - if (dtor) - dtor(p); - free_page((unsigned long)p); - return; - } - - q = &get_cpu_var(quicklist)[nr]; - p[0] = q->page; - q->page = p; - q->nr_pages++; - put_cpu_var(quicklist); + if (dtor) + dtor(p); + free_page((unsigned long)p); } void quicklist_trim(int nr, void (*dtor)(void *), @@ -81,4 +53,3 @@ unsigned long quicklist_total_size(void) #endif #endif /* LINUX_QUICKLIST_H */ - _ but it crashes early in the page allocator (i386) and I don't see why. It makes me wonder if we have a use-after-free which is hidden by the presence of the quicklist buffering or something. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V4
On Thu, 22 Mar 2007, Andrew Morton wrote: > On Thu, 22 Mar 2007 23:28:41 -0700 (PDT) Christoph Lameter <[EMAIL > PROTECTED]> wrote: > > > 1. Proven code from the IA64 arch. > > > > The method used here has been fine tuned for years and > > is NUMA aware. It is based on the knowledge that accesses > > to page table pages are sparse in nature. Taking a page > > off the freelists instead of allocating a zeroed pages > > allows a reduction of number of cachelines touched > > in addition to getting rid of the slab overhead. So > > performance improves. > > By how much? About 40% on fork+exit. See http://marc.info/?l=linux-ia64&m=110942798406005&w=2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V4
On Thu, 22 Mar 2007 23:28:41 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: > 1. Proven code from the IA64 arch. > > The method used here has been fine tuned for years and > is NUMA aware. It is based on the knowledge that accesses > to page table pages are sparse in nature. Taking a page > off the freelists instead of allocating a zeroed pages > allows a reduction of number of cachelines touched > in addition to getting rid of the slab overhead. So > performance improves. By how much? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[QUICKLIST 1/5] Quicklists for page table pages V4
Quicklists for page table pages V4 V3->V4 - Rename quicklist_check to quicklist_trim and allow parameters to specify how to clean quicklists. - Remove dead code V2->V3 - Fix Kconfig issues by setting CONFIG_QUICKLIST explicitly and default to one quicklist if NR_QUICK is not set. - Fix i386 support. (Cannot mix PMD and PTE allocs.) - Discussion of V2. http://marc.info/?l=linux-kernel&m=117391339914767&w=2 V1->V2 - Add sparch64 patch - Single i386 and x86_64 patch - Update attribution - Update justification - Update approvals - Earlier discussion of V1 was at http://marc.info/?l=linux-kernel&m=117357922219342&w=2 This patchset introduces an arch independent framework to handle lists of recently used page table pages to replace the existing (ab)use of the slab for that purpose. 1. Proven code from the IA64 arch. The method used here has been fine tuned for years and is NUMA aware. It is based on the knowledge that accesses to page table pages are sparse in nature. Taking a page off the freelists instead of allocating a zeroed pages allows a reduction of number of cachelines touched in addition to getting rid of the slab overhead. So performance improves. This is particularly useful if pgds contain standard mappings. We can save on the teardown and setup of such a page if we have some on the quicklists. This includes avoiding lists operations that are otherwise necessary on alloc and free to track pgds. 2. Light weight alternative to use slab to manage page size pages Slab overhead is significant and even page allocator use is pretty heavy weight. The use of a per cpu quicklist means that we touch only two cachelines for an allocation. There is no need to access the page_struct (unless arch code needs to fiddle around with it). So the fast past just means bringing in one cacheline at the beginning of the page. That same cacheline may then be used to store the page table entry. Or a second cacheline may be used if the page table entry is not in the first cacheline of the page. The current code will zero the page which means touching 32 cachelines (assuming 128 byte). We get down from 32 to 2 cachelines in the fast path. 3. Fix conflicting use of page_structs by slab and arch code. F.e. Both arches use the ->private and ->index field to create lists of pgds and i386 also uses other page flags. The slab can also use the ->private field for allocations that are larger than page size which would occur if one enables debugging. In that case the arch code would overwrite the pointer to the first page of the compound page allocated by the slab. SLAB has been modified to not enable debugging for such slabs (!). There the potential for additional conflicts here especially since some arches also use page flags to mark page table pages. The patch removes these conflicts by no longer using the slab for these purposes. The page allocator is more suitable since PAGE_SIZE chunks are its domain. Then we can start using standard list operations via page->lru instead of improvising linked lists. SLUB makes more extensive use of the page struct and so far had to create workarounds for these slabs. The ->index field is used for the SLUB freelist. So SLUB cannot allow the use of a freelist for these slabs and--like slab-- currently does not allow debugging and forces slabs to only contain a single object (avoids freelist). If we do not get rid of these issues then both SLAB and SLUB have to continue to provide special code paths to support these slabs. 4. i386 gets lightweight NUMA aware management of page table pages. Note that the use of SLAB on NUMA systems will require the use of alien caches to efficiently remove remote page table pages. Which (for a PAGE_SIZEd allocation) is a lengthy and expensive process. With quicklists no alien caches are needed. Pages can be simply returned to the correct node. 5. x86_64 gets lightweight page table page management. This will allow x86_64 arch code to faster repopulate pgds and other page table entries. The list operations for pgds are reduced in the same way as for i386 to the point where a pgd is allocated from the page allocator and when it is freed back to the page allocator. A pgd can pass through the quicklists without having to be reinitialized. 6. Consolidation of code from multiple arches So far arches have their own implementation of quicklist management. This patch moves that feature into the core allowing an easier maintenance and consistent management of quickl