[Fedora PATCH] Improve Resource Counter Scalability
This patch was sent to me by Balbir Singh, cc'd, who worked on the original patch. The patch results in a massive increase in performance on a 64p/32G system. The patch was successfully compiled and tested by me on fedora-latest. From the upstream commit: Data from Prarit (kernel compile with make -j64 on a 64 CPU/32G machine) For a single run Without patch real 27m8.988s user 87m24.916s sys 382m6.037s With patch real4m18.607s user84m58.943s sys 50m52.682s With config turned off real4m54.972s user90m13.456s sys 50m19.711s NOTE: The data looks counterintuitive due to the increased performance with the patch, even over the config being turned off. We probably need more runs, but so far all testing has shown that the patches definitely help. --- Backport 0c3e73e84fe3f64cf1c2e8bb4e91e8901cbcdc38 From: Balbir Singh bal...@linux.vnet.ibm.com (memcg: improve resource counter scalability) to 2.6.31. It is a very useful patch for non-users of memory control group as it reduces the overhead quite significantly. Signed-off-by: Balbir Singh bal...@linux.vnet.ibm.com --- mm/memcontrol.c | 127 ++- 1 files changed, 106 insertions(+), 21 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index fd4529d..4821be0 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -43,6 +43,7 @@ struct cgroup_subsys mem_cgroup_subsys __read_mostly; #define MEM_CGROUP_RECLAIM_RETRIES 5 +struct mem_cgroup *root_mem_cgroup __read_mostly; #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP /* Turned on only when memory cgroup is enabled really_do_swap_account = 1 */ @@ -66,6 +67,7 @@ enum mem_cgroup_stat_index { MEM_CGROUP_STAT_MAPPED_FILE, /* # of pages charged as file rss */ MEM_CGROUP_STAT_PGPGIN_COUNT, /* # of pages paged in */ MEM_CGROUP_STAT_PGPGOUT_COUNT, /* # of pages paged out */ + MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */ MEM_CGROUP_STAT_NSTATS, }; @@ -219,11 +221,24 @@ static void mem_cgroup_get(struct mem_cgroup *mem); static void mem_cgroup_put(struct mem_cgroup *mem); static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem); +static void mem_cgroup_swap_statistics(struct mem_cgroup *mem, +bool charge) +{ + int val = (charge) ? 1 : -1; + struct mem_cgroup_stat *stat = mem-stat; + struct mem_cgroup_stat_cpu *cpustat; + int cpu = get_cpu(); + + cpustat = stat-cpustat[cpu]; + __mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_SWAPOUT, val); + put_cpu(); +} + static void mem_cgroup_charge_statistics(struct mem_cgroup *mem, struct page_cgroup *pc, bool charge) { - int val = (charge)? 1 : -1; + int val = (charge) ? 1 : -1; struct mem_cgroup_stat *stat = mem-stat; struct mem_cgroup_stat_cpu *cpustat; int cpu = get_cpu(); @@ -354,6 +369,11 @@ static int mem_cgroup_walk_tree(struct mem_cgroup *root, void *data, return ret; } +static inline bool mem_cgroup_is_root(struct mem_cgroup *mem) +{ + return (mem == root_mem_cgroup); +} + /* * Following LRU functions are allowed to be used without PCG_LOCK. * Operations are called by routine of global LRU independently from memcg. @@ -996,9 +1016,11 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm, VM_BUG_ON(css_is_removed(mem-css)); while (1) { - int ret; + int ret = 0; bool noswap = false; + if (mem_cgroup_is_root(mem)) + goto done; ret = res_counter_charge(mem-res, PAGE_SIZE, fail_res); if (likely(!ret)) { if (!do_swap_account) @@ -1046,6 +1068,7 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm, goto nomem; } } +done: return 0; nomem: css_put(mem-css); @@ -1119,9 +1142,11 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem, lock_page_cgroup(pc); if (unlikely(PageCgroupUsed(pc))) { unlock_page_cgroup(pc); - res_counter_uncharge(mem-res, PAGE_SIZE); - if (do_swap_account) - res_counter_uncharge(mem-memsw, PAGE_SIZE); + if (!mem_cgroup_is_root(mem)) { + res_counter_uncharge(mem-res, PAGE_SIZE); + if (do_swap_account) + res_counter_uncharge(mem-memsw, PAGE_SIZE); + } css_put(mem-css); return; } @@ -1178,7 +1203,8 @@ static int mem_cgroup_move_account(struct page_cgroup *pc, if (pc-mem_cgroup != from) goto out; -
Re: arch fun.
Dave Jones wrote: 2. Will we eventually rename kernel-PAE.686 to kernel.686? I don't think we can, otherwise someone with non-PAE 686's who does an update will suddenly find themselves unable to boot. Hi Dave, I was thinking about this for a little while. Can't we do this instead: 1. move kernel-PAE.686 config options to kernel.686 (I'm going to refer to this as the new kernel.686) 2. kill kernel-PAE.686 3. modify the spec file for the new kernel.686 to obsolete kernel-PAE.686 ? I'm probably missing something obvious but having PAE in there seems strange to me. P. ___ Fedora-kernel-list mailing list Fedora-kernel-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-kernel-list
Re: arch fun.
Part of the problem with that idea is that the Pentium M laptops without PAE aren't that old. This might upset quite a few people. Right -- and that's a good point to keep in mind. IMO we shouldn't break *any* systems when we do this change. Given the other information coming through (about dynamic kernel PAE enable), should we really being doing this right now? Why not wait for the dynamic PAE stuff to settle upstream and then make the change? Then we can properly (IMO) drop kernel-PAE.686 and stick with kernel.686. What happens if we postpone this until F12? P. Dave ___ Fedora-kernel-list mailing list Fedora-kernel-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-kernel-list
Re: arch fun.
Dave Jones wrote: On Fri, Feb 06, 2009 at 12:34:04PM -0500, Prarit Bhargava wrote: Given the other information coming through (about dynamic kernel PAE enable), should we really being doing this right now? it's vaporware. Why not wait for the dynamic PAE stuff to settle upstream and then make the change? no-one seems to actually be doing anything. ... grr... /me hates it when that happens P. ___ Fedora-kernel-list mailing list Fedora-kernel-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-kernel-list
Re: arch fun.
Dave Jones wrote: As per the discussion in #fedora-meeting today, we're killing off kernel-i686, and just shipping.. * kernel.i586 * kernel-PAE.686 Patch below seems to dtrt.. comments? Two quick questions Dave. 1. This is for F11? 2. Will we eventually rename kernel-PAE.686 to kernel.686? P. ___ Fedora-kernel-list mailing list Fedora-kernel-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-kernel-list
Re: [Fwd: [PATCH 1/1] cciss: fix regression, sysfs symlink missing]
Doug Chapman wrote: This patch has been submitted upstream but I don't know if it will get pulled in to Fedora through the normal channels prior to F10 or not. Without this patch Fedora 10 will not install on cciss which breaks nearly all HP server systems. thanks, I think it is important to get this in for HP systems (which I often use to test with)... Chuck, Dave? Think we can take this one-liner in? P. - Doug Subject: [PATCH 1/1] cciss: fix regression, sysfs symlink missing From: Mike Miller [EMAIL PROTECTED] Date: Tue, 14 Oct 2008 13:46:49 -0500 To: Andrew Morton [EMAIL PROTECTED], [EMAIL PROTECTED] To: Andrew Morton [EMAIL PROTECTED], [EMAIL PROTECTED] CC: LKML [EMAIL PROTECTED], LKML-scsi [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] Patch 1 of 1 This patch fixes a regression where the device symlink to the pci address is not created. Offending commit 6ae5ce8e8d4de666f31286808d2285aa6a50fa40, cciss: rmove redundant code. Please consider this for inclusion. signed-off-by: Mike Miller [EMAIL PROTECTED] diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c index 1e1f915..44fb98e 100644 --- a/drivers/block/cciss.c +++ b/drivers/block/cciss.c @@ -1365,6 +1365,7 @@ static void cciss_add_disk(ctlr_info_t *h, struct gendisk *disk, disk-first_minor = drv_index NWD_SHIFT; disk-fops = cciss_fops; disk-private_data = h-drv[drv_index]; + disk-driverfs_dev = (hba[drv_index]-pdev-dev); /* Set up queue information */ blk_queue_bounce_limit(disk-queue, h-pdev-dma_mask); ___ Fedora-kernel-list mailing list Fedora-kernel-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-kernel-list ___ Fedora-kernel-list mailing list Fedora-kernel-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-kernel-list
Re: rawhide -debug
An idea that was tossed around was to do something similar to what we do in release builds, and offer separate debug/nodebug builds. But instead of how we do it in releases, do the opposite, and have a -nodebug build, whilst keeping the regular kernel debug-turned-on to maximise coverage testing. Personally, I'd like to see this but let's face it, we always will have situations where changing the timing of the kernel execution causes bugs to come-and-go. I guess there may have to be a certain amount of debug we have to live with. P. Dave ___ Fedora-kernel-list mailing list Fedora-kernel-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-kernel-list