[Fedora PATCH] Improve Resource Counter Scalability

2009-10-12 Thread Prarit Bhargava
This patch was sent to me by Balbir Singh, cc'd, who worked on the original
patch.  The patch results in a massive increase in performance on a 64p/32G
system.  The patch was successfully compiled and tested by me on fedora-latest.

From the upstream commit:

Data from Prarit (kernel compile with make -j64 on a 64
CPU/32G machine)

For a single run

Without patch

real 27m8.988s
user 87m24.916s
sys 382m6.037s

With patch

real4m18.607s
user84m58.943s
sys 50m52.682s

With config turned off

real4m54.972s
user90m13.456s
sys 50m19.711s

NOTE: The data looks counterintuitive due to the increased performance
with the patch, even over the config being turned off. We probably need
more runs, but so far all testing has shown that the patches definitely
help.

---

Backport 0c3e73e84fe3f64cf1c2e8bb4e91e8901cbcdc38

From: Balbir Singh bal...@linux.vnet.ibm.com

(memcg: improve resource counter scalability) to 2.6.31.
It is a very useful patch for non-users of memory control
group as it reduces the overhead quite significantly.

Signed-off-by: Balbir Singh bal...@linux.vnet.ibm.com
---

 mm/memcontrol.c |  127 ++-
 1 files changed, 106 insertions(+), 21 deletions(-)


diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fd4529d..4821be0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -43,6 +43,7 @@
 
 struct cgroup_subsys mem_cgroup_subsys __read_mostly;
 #define MEM_CGROUP_RECLAIM_RETRIES 5
+struct mem_cgroup *root_mem_cgroup __read_mostly;
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 /* Turned on only when memory cgroup is enabled  really_do_swap_account = 1 
*/
@@ -66,6 +67,7 @@ enum mem_cgroup_stat_index {
MEM_CGROUP_STAT_MAPPED_FILE,  /* # of pages charged as file rss */
MEM_CGROUP_STAT_PGPGIN_COUNT,   /* # of pages paged in */
MEM_CGROUP_STAT_PGPGOUT_COUNT,  /* # of pages paged out */
+   MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
 
MEM_CGROUP_STAT_NSTATS,
 };
@@ -219,11 +221,24 @@ static void mem_cgroup_get(struct mem_cgroup *mem);
 static void mem_cgroup_put(struct mem_cgroup *mem);
 static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
 
+static void mem_cgroup_swap_statistics(struct mem_cgroup *mem,
+bool charge)
+{
+   int val = (charge) ? 1 : -1;
+   struct mem_cgroup_stat *stat = mem-stat;
+   struct mem_cgroup_stat_cpu *cpustat;
+   int cpu = get_cpu();
+
+   cpustat = stat-cpustat[cpu];
+   __mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_SWAPOUT, val);
+   put_cpu();
+}
+
 static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
 struct page_cgroup *pc,
 bool charge)
 {
-   int val = (charge)? 1 : -1;
+   int val = (charge) ? 1 : -1;
struct mem_cgroup_stat *stat = mem-stat;
struct mem_cgroup_stat_cpu *cpustat;
int cpu = get_cpu();
@@ -354,6 +369,11 @@ static int mem_cgroup_walk_tree(struct mem_cgroup *root, 
void *data,
return ret;
 }
 
+static inline bool mem_cgroup_is_root(struct mem_cgroup *mem)
+{
+   return (mem == root_mem_cgroup);
+}
+
 /*
  * Following LRU functions are allowed to be used without PCG_LOCK.
  * Operations are called by routine of global LRU independently from memcg.
@@ -996,9 +1016,11 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,
VM_BUG_ON(css_is_removed(mem-css));
 
while (1) {
-   int ret;
+   int ret = 0;
bool noswap = false;
 
+   if (mem_cgroup_is_root(mem))
+   goto done;
ret = res_counter_charge(mem-res, PAGE_SIZE, fail_res);
if (likely(!ret)) {
if (!do_swap_account)
@@ -1046,6 +1068,7 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,
goto nomem;
}
}
+done:
return 0;
 nomem:
css_put(mem-css);
@@ -1119,9 +1142,11 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup 
*mem,
lock_page_cgroup(pc);
if (unlikely(PageCgroupUsed(pc))) {
unlock_page_cgroup(pc);
-   res_counter_uncharge(mem-res, PAGE_SIZE);
-   if (do_swap_account)
-   res_counter_uncharge(mem-memsw, PAGE_SIZE);
+   if (!mem_cgroup_is_root(mem)) {
+   res_counter_uncharge(mem-res, PAGE_SIZE);
+   if (do_swap_account)
+   res_counter_uncharge(mem-memsw, PAGE_SIZE);
+   }
css_put(mem-css);
return;
}
@@ -1178,7 +1203,8 @@ static int mem_cgroup_move_account(struct page_cgroup *pc,
if (pc-mem_cgroup != from)
goto out;
 
-   

Re: arch fun.

2009-02-06 Thread Prarit Bhargava



Dave Jones wrote:

  2.  Will we eventually rename kernel-PAE.686 to kernel.686?
 
I don't think we can, otherwise someone with non-PAE 686's who

does an update will suddenly find themselves unable to boot.

  


Hi Dave,

I was thinking about this for a little while.

Can't we do this instead:

1.  move kernel-PAE.686 config options to kernel.686 (I'm going to refer 
to this as the new kernel.686)

2.  kill kernel-PAE.686
3.  modify the spec file for the new kernel.686 to obsolete 
kernel-PAE.686 ?


I'm probably missing something obvious but having PAE in there seems 
strange to me.


P.

___
Fedora-kernel-list mailing list
Fedora-kernel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-kernel-list


Re: arch fun.

2009-02-06 Thread Prarit Bhargava



Part of the problem with that idea is that the Pentium M laptops without PAE
aren't that old. This might upset quite a few people.

  


Right -- and that's a good point to keep in mind.  IMO we shouldn't 
break *any* systems when we do this change.


Given the other information coming through (about dynamic kernel PAE 
enable), should we really being doing this right now?


Why not wait for the dynamic PAE stuff to settle upstream and then make 
the change?  Then we can properly (IMO) drop kernel-PAE.686 and stick 
with kernel.686.


What happens if we postpone this until F12?

P.

Dave

  


___
Fedora-kernel-list mailing list
Fedora-kernel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-kernel-list


Re: arch fun.

2009-02-06 Thread Prarit Bhargava



Dave Jones wrote:

On Fri, Feb 06, 2009 at 12:34:04PM -0500, Prarit Bhargava wrote:

  Given the other information coming through (about dynamic kernel PAE 
  enable), should we really being doing this right now?


it's vaporware. 

  Why not wait for the dynamic PAE stuff to settle upstream and then make 
  the change?


no-one seems to actually be doing anything.

  


... grr...

/me hates it when that happens

P.

___
Fedora-kernel-list mailing list
Fedora-kernel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-kernel-list


Re: arch fun.

2009-02-05 Thread Prarit Bhargava



Dave Jones wrote:

As per the discussion in #fedora-meeting today,
we're killing off kernel-i686, and just shipping..

* kernel.i586
* kernel-PAE.686

Patch below seems to dtrt.. comments?

  


Two quick questions Dave.

1.  This is for F11?
2.  Will we eventually rename kernel-PAE.686 to kernel.686?

P.

___
Fedora-kernel-list mailing list
Fedora-kernel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-kernel-list


Re: [Fwd: [PATCH 1/1] cciss: fix regression, sysfs symlink missing]

2008-10-15 Thread Prarit Bhargava



Doug Chapman wrote:

This patch has been submitted upstream but I don't know if it will get
pulled in to Fedora through the normal channels prior to F10 or not.
Without this patch Fedora 10 will not install on cciss which breaks
nearly all HP server systems.

thanks,

  


I think it is important to get this in for HP systems (which I often use 
to test with)...


Chuck, Dave?  Think we can take this one-liner in?

P.


- Doug

  




Subject:
[PATCH 1/1] cciss: fix regression, sysfs symlink missing
From:
Mike Miller [EMAIL PROTECTED]
Date:
Tue, 14 Oct 2008 13:46:49 -0500
To:
Andrew Morton [EMAIL PROTECTED], [EMAIL PROTECTED]

To:
Andrew Morton [EMAIL PROTECTED], [EMAIL PROTECTED]
CC:
LKML [EMAIL PROTECTED], LKML-scsi 
[EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED], [EMAIL PROTECTED]



Patch 1 of 1

This patch fixes a regression where the device symlink to the pci address is
not created. Offending commit 6ae5ce8e8d4de666f31286808d2285aa6a50fa40,
cciss: rmove redundant code.

Please consider this for inclusion.

signed-off-by: Mike Miller [EMAIL PROTECTED]

diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
index 1e1f915..44fb98e 100644
--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -1365,6 +1365,7 @@ static void cciss_add_disk(ctlr_info_t *h, struct gendisk 
*disk,
disk-first_minor = drv_index  NWD_SHIFT;
disk-fops = cciss_fops;
disk-private_data = h-drv[drv_index];
+   disk-driverfs_dev = (hba[drv_index]-pdev-dev);
 
 	/* Set up queue information */

blk_queue_bounce_limit(disk-queue, h-pdev-dma_mask);
  



___
Fedora-kernel-list mailing list
Fedora-kernel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-kernel-list
  


___
Fedora-kernel-list mailing list
Fedora-kernel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-kernel-list


Re: rawhide -debug

2008-02-14 Thread Prarit Bhargava


An idea that was tossed around was to do something similar to what
we do in release builds, and offer separate debug/nodebug builds.
But instead of how we do it in releases, do the opposite, and have
a -nodebug build, whilst keeping the regular kernel debug-turned-on
to maximise coverage testing.


Personally, I'd like to see this but let's face it, we always will have 
situations where changing the timing of the kernel execution causes bugs 
to come-and-go.  I guess there may have to be a certain amount of debug 
we have to live with.


P.

Dave

  


___
Fedora-kernel-list mailing list
Fedora-kernel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-kernel-list