Re: [PATCH 0/5] SUBCPUSETS: a resource control functionality using CPUSETS

2005-09-08 Thread Dinakar Guniguntala

Interesting implementation of resource controls. Cross posting this 
to ckrm-tech as well. I am sure CKRM folks have something to say...

Any thoughts about how you want to add more resource control features
on top of/in addition to this setup. (Such as memory etc)


On Thu, Sep 08, 2005 at 12:23:23AM -0700, Paul Jackson wrote:
> I'm guessing you do not want such cpusets (the parents of subcpusets)
> to overlap, because if they did, it would seem to confuse the meaning
> of getting a fixed proportion of available cpu and memory resources.  I
> was a little surprised not to see any additional checks that
> cpu_exclusive and mem_exclusive must be set true in these cpusets, to
> insure non- overlapping cpusets.

I agree with Paul here. You would want to build your controllers
on top of exclusive cpusets to keep things sane.

> On the other hand, Dinakar had more work to do than you might, because
> he needed a complete covering (so had to round up cpus in non exclusive
> cpusets to form more covering elements).  From what I can tell, you
> don't need a complete covering - it seems fine if some cpus are not
> managed by this resource control function.


I think it makes more sense to add this functionality directly as part
of the existing cpusets instead of creating further leaf cpusets (or subcpusets
as you call it) where we can specify resource control parameters. I think that 
approach would be much more intuitive and simple to work with rather than 
what you have currently. 

-Dinakar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ia64 cpuset + build_sched_domains() mangles structures

2005-09-02 Thread Dinakar Guniguntala
Andrew,

Please include the patch below into -mm. I had reported a problem
with this patch earlier on 2.6.13-rc6, but I am just not able to
reproduce the problem on newer kernels (2.6.13 and 2.6.13-mm1).

I have tested this extensively on a Power5 box and I believe
that John Hawke's has tested this on ia64 as well.

The patch is here

http://marc.theaimsgroup.com/?l=linux-ia64&m=112474434128996&w=2


Regards,

Dinakar



On Mon, Aug 22, 2005 at 06:07:19PM +0200, Ingo Molnar wrote:
> 
> * Dinakar Guniguntala <[EMAIL PROTECTED]> wrote:
> 
> > On Mon, Aug 22, 2005 at 09:08:34AM +0200, Ingo Molnar wrote:
> > > 
> > > in terms of 2.6.14, the replacement patch below also does what i always 
> > > wanted to do: to merge the ia64-specific build_sched_domains() code back 
> > > into kernel/sched.c. I've done this by taking your improved dynamic 
> > > build-domains code and putting it into kernel/sched.c.
> > > 
> > 
> > Ingo, one change required to your patch and the exclusive
> > cpuset functionality seems to work fine on a NUMA ppc64 box.
> > I am still running some of my dynamic sched domain tests. So far
> > it seems to be holding ok.
> 
> great! Andrew, i'd suggest we try the merged patch attached below in 
> -mm.
> 
> > Any idea why the ia64 stuff was forked in the first place?
> 
> most of the NUMA domain-trees stuff happened in the ia64 space so there 
> was a natural desire to keep it more hackable there. But now i think 
> it's getting counterproductive.
> 
>   Ingo
> 
> -
> I've already sent this to the maintainers, and this is now being sent to a
> larger community audience.  I have fixed a problem with the ia64 version of
> build_sched_domains(), but a similar fix still needs to be made to the
> generic build_sched_domains() in kernel/sched.c.
> 
> The "dynamic sched domains" functionality has recently been merged into
> 2.6.13-rcN that sees the dynamic declaration of a cpu-exclusive (a.k.a.
> "isolated") cpuset and rebuilds the CPU Scheduler sched domains and sched
> groups to separate away the CPUs in this cpu-exclusive cpuset from the
> remainder of the non-isolated CPUs.  This allows the non-isolated CPUs to
> completely ignore the isolated CPUs when doing load-balancing.
> 
> Unfortunately, build_sched_domains() expects that a sched domain will
> include all the CPUs of each node in the domain, i.e., that no node will
> belong in both an isolated cpuset and a non-isolated cpuset.  Declaring
> a cpuset that violates this presumption will produce flawed data
> structures and will oops the kernel.
> 
> To trigger the problem (on a NUMA system with >1 CPUs per node):
>cd /dev/cpuset
>mkdir newcpuset
>cd newcpuset
>echo 0 >cpus
>echo 0 >mems
>echo 1 >cpu_exclusive
> 
> I have fixed this shortcoming for ia64 NUMA (with multiple CPUs per node).
> A similar shortcoming exists in the generic build_sched_domains() (in
> kernel/sched.c) for NUMA, and that needs to be fixed also.  The fix involves
> dynamically allocating sched_group_nodes[] and sched_group_allnodes[] for
> each invocation of build_sched_domains(), rather than using global arrays
> for these structures.  Care must be taken to remember kmalloc() addresses
> so that arch_destroy_sched_domains() can properly kfree() the new dynamic
> structures.
> 
> This is a patch against 2.6.13-rc6.
> 
> Signed-off-by: John Hawkes <[EMAIL PROTECTED]>
> 
> reworked the patch to also move the ia64 domain setup code to the generic
> code.
> 
> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
> 
> ppc64 fix
> 
> From: Dinakar Guniguntala <[EMAIL PROTECTED]>
> 
>  arch/ia64/kernel/domain.c|  400 
> ---
>  arch/ia64/kernel/Makefile|2 
>  include/asm-ia64/processor.h |3 
>  include/asm-ia64/topology.h  |   22 --
>  include/linux/sched.h|9 
>  include/linux/topology.h |   22 ++
>  kernel/sched.c   |  290 +--
>  7 files changed, 259 insertions(+), 489 deletions(-)
> 
> Index: linux-sched-curr/arch/ia64/kernel/Makefile
> ===
> --- linux-sched-curr.orig/arch/ia64/kernel/Makefile
> +++ linux-sched-curr/arch/ia64/kernel/Makefile
> @@ -16,7 +16,7 @@ obj-$(CONFIG_IA64_HP_ZX1_SWIOTLB) += acp
>  obj-$(CONFIG_IA64_PALINFO)   += palinfo.o
>  obj-$(CONFIG_IOSAPIC)+= iosapic.o
>  obj-$(CONFIG_MODULES)+= module.o
> -obj-$(CONFIG_SMP)+= smp.o smpboot.o domain.o
> +obj-$(CONFIG_SMP)+= smp.o smpboo

Re: [PATCH 2.6.13-rc6] cpu_exclusive sched domains build fix

2005-08-25 Thread Dinakar Guniguntala
On Wed, Aug 24, 2005 at 01:31:07PM -0700, Paul Jackson wrote:
> ==
> 
> The safest, mind numbingly simple thing to do that would avoid the oops
> that Hawkes reported is to simply not have the cpuset code call the
> code to setup a dynamic sched domain.  This is choice (2) above, and
> could be done at the last hour with relative safety.
> 
> Here is an untested patch that does (2):
> 
> =
> 
> Index: linux-2.6.13-cpuset-mempolicy-migrate/kernel/cpuset.c
> ===
> --- linux-2.6.13-cpuset-mempolicy-migrate.orig/kernel/cpuset.c
> +++ linux-2.6.13-cpuset-mempolicy-migrate/kernel/cpuset.c
> @@ -627,6 +627,15 @@ static int validate_change(const struct 
>   * Call with cpuset_sem held.  May nest a call to the
>   * lock_cpu_hotplug()/unlock_cpu_hotplug() pair.
>   */
> +
> +/*
> + * Hack to avoid 2.6.13 partial node dynamic sched domain bug.
> + * Disable letting 'cpu_exclusive' cpusets define dynamic sched
> + * domains, until the sched domain can handle partial nodes.
> + * Remove this ifdef hackery when sched domains fixed.
> + */
> +#define DISABLE_EXCLUSIVE_CPU_DOMAINS 1
> +#ifdef DISABLE_EXCLUSIVE_CPU_DOMAINS
>  static void update_cpu_domains(struct cpuset *cur)
>  {
>   struct cpuset *c, *par = cur->parent;
> @@ -667,6 +676,11 @@ static void update_cpu_domains(struct cp
>   partition_sched_domains(&pspan, &cspan);
>   unlock_cpu_hotplug();
>  }
> +#else
> +static void update_cpu_domains(struct cpuset *cur)
> +{
> +}
> +#endif
>  
>  static int update_cpumask(struct cpuset *cs, char *buf)
>  {
> 
> 
> =
> 

I'll ack this for now until I fix the problems that I am seeing
on ppc64


Acked-by: Dinakar Guniguntala <[EMAIL PROTECTED]>


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13-rc6] cpu_exclusive sched domains build fix

2005-08-24 Thread Dinakar Guniguntala
Paul,

Can we hold on to this patch for a while, as I reported yesterday,
this hangs up my ppc64 box on doing rmdir on a exclusive cpuset.
Still debugging the problem, hope to have a fix soon, Thanks

-Dinakar


On Wed, Aug 24, 2005 at 04:15:10AM -0700, Paul Jackson wrote:
> As reported by Paul Mackerras <[EMAIL PROTECTED]>, the previous
> patch "cpu_exclusive sched domains fix" broke the ppc64 build,
> yielding error messages:
> 
> kernel/cpuset.c: In function 'update_cpu_domains':
> kernel/cpuset.c:648: error: invalid lvalue in unary '&'
> kernel/cpuset.c:648: error: invalid lvalue in unary '&'
> 
> On some arch's, the node_to_cpumask() is a function, returning
> a cpumask_t.  But the for_each_cpu_mask() requires an lvalue mask.
> 
> The following patch fixes this build failure by making a copy
> of the cpumask_t on the stack.
> 
> I have _not_ yet tried to build this for ppc64 - just for ia64.
> I will try that now.  But the fix seems obvious enough that it
> is worth sending out now.
> 
> Signed-off-by: Paul Jackson <[EMAIL PROTECTED]>
> 
> Index: linux-2.6.13-cpuset-mempolicy-migrate/kernel/cpuset.c
> ===
> --- linux-2.6.13-cpuset-mempolicy-migrate.orig/kernel/cpuset.c
> +++ linux-2.6.13-cpuset-mempolicy-migrate/kernel/cpuset.c
> @@ -645,7 +645,9 @@ static void update_cpu_domains(struct cp
>   int i, j;
>  
>   for_each_cpu_mask(i, cur->cpus_allowed) {
> - for_each_cpu_mask(j, node_to_cpumask(cpu_to_node(i))) {
> + cpumask_t mask = node_to_cpumask(cpu_to_node(i));
> +
> + for_each_cpu_mask(j, mask) {
>   if (!cpu_isset(j, cur->cpus_allowed))
>   return;
>   }
> 
> -- 
>   I won't rest till it's the best ...
>   Programmer, Linux Scalability
>   Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux-2.6.13-rc7

2005-08-24 Thread Dinakar Guniguntala
On Wed, Aug 24, 2005 at 07:43:42AM +0100, Al Viro wrote:
> On Tue, Aug 23, 2005 at 10:08:13PM -0700, Linus Torvalds wrote:
> 
> >   cpu_exclusive sched domains on partial nodes temp fix
> 
> ... breaks ppc64 since there we have node_to_cpumask() done as inlined
> function, not a macro.  So we get __first_cpu(&node_to_cpumask(...),...),
> with obvious consequences.
> 
> Locally I'm turning node_to_cpumask() into define, just to see what else
> had changed in the build, but we probably want saner solution for that
> one...

Not sure why this patch was included. I had reported yesterday that
it hangs up ppc64 on doing some exclusive cpuset operations. (I had
fixed the compile problem by having a temp for the cpumask variable)

So this patch is not ready to go in just yet. I am working on the fix,
hope to have it soon

-Dinakar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13-rc6] cpu_exclusive sched domains on partial nodes temp fix

2005-08-23 Thread Dinakar Guniguntala
On Tue, Aug 23, 2005 at 01:04:27AM -0700, Paul Jackson wrote:
> If Dinakar, Hawkes and Nick concur (and no one else complains too
> loud) then the following should go into 2.6.13, to avoid the potential
> kernel oops that Hawkes reported in Dinakar's feature to allow user
> control of dynamic sched domain placement using cpu_exclusive cpusets.

I agree this is the way to go for 2.6.13 before we fix things the
right way for 2.6.14. Thanks for the patch Paul.

> This patch should allow proceeding with this new feature in 2.6.13 for
> the configurations in which it is useful (node alligned sched domains)
> while avoiding trying to setup sched domains in the less useful cases
> that can cause the kernel corruption and oops.
> 

Dunno if it is something in my setup (4 CPU Power5 box with NUMA enabled)
but this patch causes some hard hangs when I run the attached script.
The same script runs for much longer with Ingo's changes but panics
as I had described earlier. I am still debugging what causes this.

-Dinakar




sd-stress.tar.gz
Description: GNU Zip compressed data


Re: [PATCH] ia64 cpuset + build_sched_domains() mangles structures

2005-08-22 Thread Dinakar Guniguntala
On Mon, Aug 22, 2005 at 09:08:34AM +0200, Ingo Molnar wrote:
> 
> in terms of 2.6.14, the replacement patch below also does what i always 
> wanted to do: to merge the ia64-specific build_sched_domains() code back 
> into kernel/sched.c. I've done this by taking your improved dynamic 
> build-domains code and putting it into kernel/sched.c.
> 

Ingo, one change required to your patch and the exclusive
cpuset functionality seems to work fine on a NUMA ppc64 box.
I am still running some of my dynamic sched domain tests. So far
it seems to be holding ok.
Any idea why the ia64 stuff was forked in the first place?

The patch below is on top of your patch. (This is the earlier patch
John had sent)

-Dinakar

diff -Naurp linux-2.6.13-rc6.ingo/kernel/sched.c linux-2.6.13-rc6/kernel/sched.c
--- linux-2.6.13-rc6.ingo/kernel/sched.c2005-08-22 19:23:06.0 
+0530
+++ linux-2.6.13-rc6/kernel/sched.c 2005-08-22 19:36:45.0 +0530
@@ -5192,7 +5192,7 @@ next_sg:
 #endif
 
/* Attach the domains */
-   for_each_online_cpu(i) {
+   for_each_cpu_mask(i, *cpu_map) {
struct sched_domain *sd;
 #ifdef CONFIG_SCHED_SMT
sd = &per_cpu(cpu_domains, i);


Re: [PATCH] ia64 cpuset + build_sched_domains() mangles structures

2005-08-22 Thread Dinakar Guniguntala
On Tue, Aug 23, 2005 at 01:46:26AM +0530, Dinakar Guniguntala wrote:
> On Mon, Aug 22, 2005 at 06:07:19PM +0200, Ingo Molnar wrote:
> > great! Andrew, i'd suggest we try the merged patch attached below in 
> > -mm.
> > 
> 
> Ingo, unfortunately I am hitting panic's on stress testing. The panic
> screen is attached in the .png below.

Sorry, forgot to add the .png. Here it is...

> 
> On debugging I found that the panic happens consistently in this line
>  of code in function find_busiest_group
> 
>   *imbalance = min((max_load - avg_load) * busiest->cpu_power,
> (avg_load - this_load) * this->cpu_power)
> / SCHED_LOAD_SCALE;
> 
> Here I find that the "this" pointer is still NULL. I verified this by
> a quick hack as below in the same function and with this hack it seems 
> to run for hours
> 
> - if (!busiest || this_load >= max_load)
> + if (!this || !busiest || this_load >= max_load)
> 
> This can only happen if the none of the sched groups pointed to by the 
> 'sd' of the current cpu contain the current cpu. I was wondering if
> this had anything to do with the way that we are using RCU to assign/
> read the 'sd' pointer.
> 
> Any thoughts ??
> 
>   -Dinakar
> 


sd-panic.png
Description: PNG image


Re: [PATCH] ia64 cpuset + build_sched_domains() mangles structures

2005-08-22 Thread Dinakar Guniguntala
On Mon, Aug 22, 2005 at 06:07:19PM +0200, Ingo Molnar wrote:
> great! Andrew, i'd suggest we try the merged patch attached below in 
> -mm.
> 

Ingo, unfortunately I am hitting panic's on stress testing. The panic
screen is attached in the .png below.

On debugging I found that the panic happens consistently in this line
 of code in function find_busiest_group

*imbalance = min((max_load - avg_load) * busiest->cpu_power,
(avg_load - this_load) * this->cpu_power)
/ SCHED_LOAD_SCALE;

Here I find that the "this" pointer is still NULL. I verified this by
a quick hack as below in the same function and with this hack it seems 
to run for hours

-   if (!busiest || this_load >= max_load)
+   if (!this || !busiest || this_load >= max_load)

This can only happen if the none of the sched groups pointed to by the 
'sd' of the current cpu contain the current cpu. I was wondering if
this had anything to do with the way that we are using RCU to assign/
read the 'sd' pointer.

Any thoughts ??

-Dinakar

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH] Dynamic sched domains aka Isolated cpusets (v0.2)

2005-04-21 Thread Dinakar Guniguntala

Based on the Paul's feedback, I have simplified and cleaned up the
code quite a bit. 

o  I have taken care of most of the nits, except for the output
   format change for cpusets with isolated children. 
o  Also most of my documentation has been part of my earlier mails 
   and I have not yet added them to cpusets.txt. 
o  I still havent looked at the memory side of things. 
o  Most of the changes are in the cpusets code and almost none 
   in the sched code. (I'll do that next week)
o  Hopefully my earlier mails regarding the design have clarified
   many of the questions that were raised

So here goes version 0.2

-rw-r--r--1 root root16548 Apr 21 20:54 cpuset.o.orig
-rw-r--r--1 root root17548 Apr 21 22:09 cpuset.o.sd-v0.2

  Around ~6% increase in kernel text size of cpuset.o

 include/linux/init.h  |2
 include/linux/sched.h |1
 kernel/cpuset.c   |  153 +-
 kernel/sched.c|  111 
 4 files changed, 216 insertions(+), 51 deletions(-)


diff -Naurp linux-2.6.12-rc1-mm1.orig/include/linux/init.h 
linux-2.6.12-rc1-mm1/include/linux/init.h
--- linux-2.6.12-rc1-mm1.orig/include/linux/init.h  2005-03-18 
07:03:49.0 +0530
+++ linux-2.6.12-rc1-mm1/include/linux/init.h   2005-04-21 21:54:06.0 
+0530
@@ -217,7 +217,7 @@ void __init parse_early_param(void);
 #define __initdata_or_module __initdata
 #endif /*CONFIG_MODULES*/
 
-#ifdef CONFIG_HOTPLUG
+#if defined(CONFIG_HOTPLUG) || defined(CONFIG_CPUSETS)
 #define __devinit
 #define __devinitdata
 #define __devexit
diff -Naurp linux-2.6.12-rc1-mm1.orig/include/linux/sched.h 
linux-2.6.12-rc1-mm1/include/linux/sched.h
--- linux-2.6.12-rc1-mm1.orig/include/linux/sched.h 2005-04-21 
21:50:26.0 +0530
+++ linux-2.6.12-rc1-mm1/include/linux/sched.h  2005-04-21 21:53:57.0 
+0530
@@ -155,6 +155,7 @@ typedef struct task_struct task_t;
 extern void sched_init(void);
 extern void sched_init_smp(void);
 extern void init_idle(task_t *idle, int cpu);
+extern void rebuild_sched_domains(cpumask_t span1, cpumask_t span2);
 
 extern cpumask_t nohz_cpu_mask;
 
diff -Naurp linux-2.6.12-rc1-mm1.orig/kernel/cpuset.c 
linux-2.6.12-rc1-mm1/kernel/cpuset.c
--- linux-2.6.12-rc1-mm1.orig/kernel/cpuset.c   2005-04-21 21:50:26.0 
+0530
+++ linux-2.6.12-rc1-mm1/kernel/cpuset.c2005-04-21 22:00:36.0 
+0530
@@ -57,7 +57,13 @@
 
 struct cpuset {
unsigned long flags;/* "unsigned long" so bitops work */
-   cpumask_t cpus_allowed; /* CPUs allowed to tasks in cpuset */
+   /* 
+* CPUs allowed to tasks in cpuset and 
+* not part of any isolated children
+*/
+   cpumask_t cpus_allowed; 
+
+   cpumask_t isolated_map; /* CPUs associated with isolated 
children */
nodemask_t mems_allowed;/* Memory Nodes allowed to tasks */
 
atomic_t count; /* count tasks using this cpuset */
@@ -82,6 +88,7 @@ struct cpuset {
 /* bits in struct cpuset flags field */
 typedef enum {
CS_CPU_EXCLUSIVE,
+   CS_CPU_ISOLATED,
CS_MEM_EXCLUSIVE,
CS_REMOVED,
CS_NOTIFY_ON_RELEASE
@@ -93,6 +100,11 @@ static inline int is_cpu_exclusive(const
return !!test_bit(CS_CPU_EXCLUSIVE, &cs->flags);
 }
 
+static inline int is_cpu_isolated(const struct cpuset *cs)
+{
+   return !!test_bit(CS_CPU_ISOLATED, &cs->flags);
+}
+
 static inline int is_mem_exclusive(const struct cpuset *cs)
 {
return !!test_bit(CS_MEM_EXCLUSIVE, &cs->flags);
@@ -127,8 +139,10 @@ static inline int notify_on_release(cons
 static atomic_t cpuset_mems_generation = ATOMIC_INIT(1);
 
 static struct cpuset top_cpuset = {
-   .flags = ((1 << CS_CPU_EXCLUSIVE) | (1 << CS_MEM_EXCLUSIVE)),
+   .flags = ((1 << CS_CPU_EXCLUSIVE) | (1 << CS_CPU_ISOLATED) | 
+ (1 << CS_MEM_EXCLUSIVE)),
.cpus_allowed = CPU_MASK_ALL,
+   .isolated_map = CPU_MASK_NONE,
.mems_allowed = NODE_MASK_ALL,
.count = ATOMIC_INIT(0),
.sibling = LIST_HEAD_INIT(top_cpuset.sibling),
@@ -543,9 +557,14 @@ static void refresh_mems(void)
 
 static int is_cpuset_subset(const struct cpuset *p, const struct cpuset *q)
 {
-   return  cpus_subset(p->cpus_allowed, q->cpus_allowed) &&
+   cpumask_t all_map;
+
+   cpus_or(all_map, q->cpus_allowed, q->isolated_map);
+
+   return  cpus_subset(p->cpus_allowed, all_map) &&
nodes_subset(p->mems_allowed, q->mems_allowed) &&
is_cpu_exclusive(p) <= is_cpu_exclusive(q) &&
+   is_cpu_isolated(p) <= is_cpu_isolated(q) &&
is_mem_exclusive(p) <= is_mem_exclusive(q);
 }
 
@@ -587,6 +606,11 @@ static int validate_change(const struct 
if (!is_cpuset_subset(trial, par))
return -EACCES;
 
+   /* An isolated cpuset has to be exclusive */
+   if ((is_cpu_isolated

Re: [Lse-tech] Re: [RFC PATCH] Dynamic sched domains aka Isolated cpusets

2005-04-21 Thread Dinakar Guniguntala
On Wed, Apr 20, 2005 at 12:09:46PM -0700, Paul Jackson wrote:
> Earlier, I wrote to Dinakar:
> > What are your invariants, and how can you assure yourself and us
> > that your code preserves these invariants?

Ok, Let me begin at the beginning and attempt to define what I am 
doing here

1. I need a method to isolate a random set of cpus in such a way that
   only the set of processes that are specifically assigned can
   make use of these CPUs
2. I need to ensure that the sched load balance code does not pull
   any tasks other than the assigned ones onto these cpus
3. I need to be able to create multiple such groupings of cpus
   that are disjoint from the rest and run only specified tasks
4. I need a user interface to specify which random set of cpus
   form such a grouping of disjoint cpus
5. I need to be able to dynamically create and destroy these
   grouping of disjoint cpus
6. I need to be able to add/remove cpus to/from this grouping


Now if you try to fit these requirements onto cpusets, keeping in mind
that it already has an user interface and some of the frame work
required to create disjoint groupings of cpus

1. An exclusive cpuset ensures that the cpus it has are disjoint from
   all other cpusets except its parent and children
2. So now I need a way to disassociate the cpus of an exclusive
   cpuset from its parent, so that this set of cpus are truly
   disjoint from the rest of the system.
3. After I have done (2) above, I now need to build two set of sched 
   domains corresponding to the cpus of this exclusive cpuset and the 
   remaining cpus of its parent
4. Ensure that the current rules of non-isolated cpusets are all
   preserved such that if this feature is not used, all other features
   work as before

This is exactly what I have tried to do.

1. Maintain a flag to indicate whether a cpuset is isolated
2. Maintain an isolated_map for every cpuset. This contains a cache of 
   all cpus associated with isolated children
3. To isolate a cpuset x, x has to be an exclusive cpuset and its
   parent has to be an isolated cpuset
4. On isolating a cpuset by issuing
   /bin/echo 1 > cpu_isolated
   
   It ensures that conditions in (3) are satisfied and then removes the 
   cpus of the current cpuset from the parent cpus_allowed mask. (It also
   puts the cpus of the current cpuset into the isolated_map of its parent)
   This ensures that only the current cpuset and its children will have
   access to the now isolated cpus.
   It also rebuilds the sched domains into two new domains consisting of
   a. All cpus in the parent->cpus_allowed
   b. All cpus in current->cpus_allowed
5. Similarly on setting isolated off on a isolated cpuset, (or on doing
   an rmdir on an isolated cpuset) It adds all of the cpus of the current 
   cpuset into its parent cpuset's cpus_allowed mask and removes them from 
   it's parent's isolated_map

   This ensures that all of the cpus in the current cpuset are now
   visible to the parent cpuset.

   It now rebuilds only one sched domain consisting of all of the cpus
   in its parent's cpus_allowed mask.
6. You can also modify the cpus present in an isolated cpuset x provided
   that x does not have any children that are also isolated.
7. On adding or removing cpus from an isolated cpuset that does not
   have any isolated children, it reworks the parent cpuset's
   cpus_allowed and isolated_map masks and rebuilds the sched domains
   appropriately
8. Since the function update_cpu_domains, which does all of the
   above updations to the parent cpuset's masks, is always called with
   cpuset_sem held, it ensures that all these changes are atomic.


> > He removes cpus 4-5 from batch and adds them to cint
> 
> Could you spell out the exact steps the user would take, for this part
> of your example?  What does the user do, what does the kernel do in
> response, and what state the cpusets end up in, after each action of the
> user?


   cpuset   cpus   isolated   cpus_allowed   isolated_map
 top 0-7   1   0 0-7
 top/lowlat 0-11  0-1 0
 top/others 2-71  4-72-3
 top/others/cint   2-3 1  2-3 0
 top/others/batch  4-7 0  4-7 0

At this point to remove cpus 4-5 from batch and add them to cint, the admin
would do the following steps

# Remove cpus 4-5 from batch
# batch is not a isolated cpuset and hence this step 
# has no other implications
/bin/echo 6-7 > /top/others/batch/cpus 

   cpuset   cpus   isolated   cpus_allowed   isolated_map
 top 0-7   1   0 0-7
 top/lowlat 0-11  0-1 0
 top/others 2-71  4-72-3
 top/others/cint   2-3 1  2-3

Re: [RFC PATCH] Dynamic sched domains aka Isolated cpusets

2005-04-20 Thread Dinakar Guniguntala
On Tue, Apr 19, 2005 at 08:26:39AM -0700, Paul Jackson wrote:
>  * Your understanding of "cpu_exclusive" is not the same as mine.

Sorry for creating confusion by what I said earlier, I do understand
exactly what cpu_exclusive means. Its just that when I started
working on this (a long time ago) I had a different notion and that is
what I was referring to, I probably should never have brought that up

> 
> > Since isolated cpusets are trying to partition the system, this
> > can be restricted to only the first level of cpusets.
> 
> I do not think such a restriction is a good idea.  For example, lets say
> our 8 CPU system has the following cpusets:
> 

And my current implementation has no such restriction, I was only
suggesting that to simplify the code.

> 
> > Also I think we can add further restrictions in terms not being able
> > to change (add/remove) cpus within a isolated cpuset.
> 
> My approach agrees on this restriction.  Earlier I wrote:
> > Also note that adding or removing a cpu from a cpuset that has
> > its domain_cpu_current flag set true must fail, and similarly
> > for domain_mem_current.
> 
> This restriction is required in my approach because the CPUs in the
> domain_cpu_current cpusets (the isolated CPUs, in your terms) form a
> partition (disjoint cover) of the CPUs in the system, which property
> would be violated immediately if any CPU were added or removed from any
> cpuset defining the partition.

See my other note explaining how things work currently. I do feel that
this restriction is not good

-Dinakar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: [RFC PATCH] Dynamic sched domains aka Isolated cpusets

2005-04-20 Thread Dinakar Guniguntala
On Tue, Apr 19, 2005 at 10:23:48AM -0700, Paul Jackson wrote:
> 
> How does this play out in your interface?  Are you convinced that
> your invariants are preserved at all times, to all users?  Can
> you present a convincing argument to others that this is so?


Let me give an example of how the current version of isolated cpusets can
be used and hopefully clarify my approach.


Consider a system with 8 cpus that needs to run a mix of workloads.
One set of applications have low latency requirements and another
set have a mixed workload. The administrator decides to allot
2 cpus to the low latency application and the rest to other apps.
To do this, he creates two cpusets
(All cpusets are considered to be exclusive for this discussion)

   cpuset   cpus   isolated   cpus_allowed   isolated_map
 top 0-7   1  0-7 0
 top/lowlat 0-10  0-1 0
 top/others 2-70  2-7 0

He now wants to partition the system along these lines as he wants
to isolate lowlat from the rest of the system to ensure that
a. No tasks from the parent cpuset (top_cpuset in this case)
   use these cpus
b. load balance does not run across all cpus 0-7

He does this by

cd /mount-point/lowlat
/bin/echo 1 > cpu_isolated

Internally it takes the cpuset_sem, does some sanity checks and ensures
that these cpus are not visible to any other cpuset including its parent
(by removing these cpus from its parent's cpus_allowed mask and adding
them to its parent's isolated_map) and then calls sched code to partition
the system as

[0-1] [2-7]

   The internal state of data structures are as follows

   cpuset   cpus   isolated   cpus_allowed   isolated_map
 top 0-7   1  2-70-1
 top/lowlat 0-11  0-1 0
 top/others 2-70  2-7 0

---


The administrator now wants to further partition the "others" cpuset into
a cpu intensive application and a batch one

   cpuset   cpus   isolated   cpus_allowed   isolated_map
 top 0-7   1  2-70-1
 top/lowlat 0-11  0-1 0
 top/others 2-70  2-7 0
 top/others/cint   2-3 0  2-3 0
 top/others/batch  4-7 0  4-7 0


If now the administrator wants to isolate the cint cpuset...

cd /mount-point/others
/bin/echo 1 > cpu_isolated

(At this point no new sched domains are built
 as there exists a sched domain which exactly
 matches the cpus in the "others" cpuset.)

cd /mount-point/others/cint
/bin/echo 1 > cpu_isolated

At this point cpus from the "others" cpuset are also taken away from its
parent cpus_allowed mask and put into the parent's isolated_map. This means
that the parent cpus_allowed mask is empty.  This would now result in
partitioning the "others" cpuset and builds two new sched domains as follows

[2-3] [4-7]

Notice that the cpus 0-1 having already been isolated are not affected
in this operation

   cpuset   cpus   isolated   cpus_allowed   isolated_map
 top 0-7   1   0 0-7
 top/lowlat 0-11  0-1 0
 top/others 2-71  4-72-3
 top/others/cint   2-3 1  2-3 0
 top/others/batch  4-7 0  4-7 0

---

The admin now wants to run more applications in the cint cpuset
and decides to borrow a couple of cpus from the batch cpuset
He removes cpus 4-5 from batch and adds them to cint

   cpuset   cpus   isolated   cpus_allowed   isolated_map
 top 0-7   1   0 0-7
 top/lowlat 0-11  0-1 0
 top/others 2-71  6-72-5
 top/others/cint   2-5 1  2-5 0
 top/others/batch  6-7 0  6-7 0

As cint is already isolated, adding cpus causes it to rebuild all cpus
covered by its cpus_allowed and its parent's cpus_allowed, so the new
sched domains will look as follows

[2-5] [6-7]

cpus 0-1 are ofcourse still not affected

Similarly the admin can remove cpus from cint, which will
result in the domains being rebuilt to what was before

[2-3] [4-7]

---


Hope this clears up my approach. Also note that w

Re: [RFC PATCH] Dynamic sched domains aka Isolated cpusets

2005-04-19 Thread Dinakar Guniguntala
On Tue, Apr 19, 2005 at 04:19:35PM +1000, Nick Piggin wrote:

[...Snip...]
> Though I imagine this becomes a complete superset of the
> isolcpus= functionality, and it would actually be easier to
> manage a single isolated CPU and its associated processes with
> the cpusets interfaces after this.

That is the idea, though I think that we need to be able to
provide users the option of not doing a load balance within a
sched domain

> It doesn't work if you have *most* jobs bound to either
> {0, 1, 2, 3} or {4, 5, 6, 7} but one which should be allowed
> to use any CPU from 0-7.

That is the current definition of cpu_exclusive on cpusets.
I initially thought of attaching exclusive cpusets to sched domains,
but that would not work because of this reason

> > 
> > In the case of cpus, we really do prefer the partitions to be
> > disjoint, because it would be better not to confuse the domain
> > scheduler with overlapping domains.
> > 
> 
> Yes. The domain scheduler can't handle this at all, it would
> have to fall back on cpus_allowed, which in turn can create
> big problems for multiprocessor balancing.
> 

I agree

> >From what I gather, this partitioning does not exactly fit
> the cpusets architecture. Because with cpusets you are specifying
> on what cpus can a set of tasks run, not dividing the whole system.

Since isolated cpusets are trying to partition the system, this
can be restricted to only the first level of cpusets. Keeping in mind
that we have a flat sched domain heirarchy, I think probably this
would simplify the update_sched_domains function quite a bit.

Also I think we can add further restrictions in terms not being able
to change (add/remove) cpus within a isolated cpuset. Instead one would
have to tear down an existing cpuset and make a new one with the
required configuration. that would simplify things even further

> The sched-domains setup code will take care of all that for you
> already. It won't know or care about the partitions. If you
> partition a 64-way system into 2 32-ways, the domain setup code
> will just think it is setting up a 32-way system.
> 
> Don't worry about the sched-domains side of things at all, that's
> pretty easy. Basically you just have to know that it has the
> capability to partition the system in an arbitrary disjoint set
> of sets of cpus.

And maybe also have a flag that says whether to have load balancing
in this domain or not

-Dinakar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: [RFC PATCH] Dynamic sched domains aka Isolated cpusets

2005-04-19 Thread Dinakar Guniguntala
On Mon, Apr 18, 2005 at 10:54:27PM -0700, Paul Jackson wrote:
> Hmmm ... interesting patch.  My reaction to the changes in
> kernel/cpuset.c are complicated:

Thanks Paul for taking time off your vaction to reply to this.
I was expecting to see one of your huge mails but this has
exceeded all my expectations :)

>  * I'd probably ditch the all_cpus() macro, on the
>concern that it obfuscates more than it helps.
>  * The need for _both_ a per-cpuset flag 'CS_CPU_ISOLATED'
>and another per-cpuset mask 'isolated_map' concerns me.
>I guess that the isolated_map is just a cache of the
>set of CPUs isolated in child cpusets, not an independently
>settable mask, but it needs to be clearly marked as such
>if so.

Currently the isolated_map is read-only as you have guessed.
I did think of the user adding cpus to this map from the 
cpus_allowed mask but thought the current approach made more sense

>  * Some code lines go past column 80.
I need to set my vi to wrap past 80...

>  * The name 'isolated'  probably won't work.  There is already
>a boottime option "isolcpus=..." for 'isolated' cpus which
>is (I think ?) rather different.  Perhaps a better name will
>fall out of the conceptual discussion, below.

I was hoping that by the time we are done with this, we would
be able to completely get rid of the isolcpus= option. For that
ofcourse we need to be able build domains that dont run
load balance

>  * The change to the output format of the special cpuset file
>'cpus', to look like '0-3[4-7]' bothers me in a couple of
>ways.  It complicates the format from being a simple list.
>And it means that the output format is not the same as the
>input format (you can't just write back what you read from
>such a file anymore).

As i had said in my earlier mail, this was just one way of
representing what I call isolated cpus. The other was to expose
isolated_map to userspace and move cpus between cpus_allowed
and isolated_map

>  * Several comments start with the word 'Set', as in:
>   Set isolated ON on a non exclusive cpuset
>Such wording suggests to me that something is being set,
>some bit or value changed or turned on.  But in each case,
>you are just testing for some condition that will return
>or error out.  Some phrasing such as "If ..." or other
>conditional would be clearer.

The wording was from the users point of view for what
action was being done, guess I'll change that

>  * The update_sched_domains() routine is complicated, and
>hence a primary clue that the conceptual model is not
>clean yet.

It is complicated because it has to handle all of the different
possible actions that the user can initiate. It can be simplified
if we have stricter rules of what the user can/cannot do
w.r.t to isolated cpusets

>  * None of this was explained in Documentation/cpusets.txt.

Yes I plan to add the documentation shortly

>  * Too bad that cpuset_common_file_write() has to have special
>logic for this isolated case.  The other flag settings just
>turn on and off the associated bit, and don't trigger any
>kernel code to adapt to new cpu or memory settings.  We
>should make an exception to that behaviour only if we must,
>and then we must be explicit about the exception.

See my notes on isolated_map above

> First, let me verify one thing.  I understand that the _key_
> purpose of your patch is not so much to isolate cpus, as it
> is to allow for structuring scheduling domains to align with
> cpuset boundaries.  I understand real isolated cpus to be ones
> that don't have a scheduling domain (have only the dummy one),
> as requested by the "isolcpus=..." boot flag.

Not really. Isolated cpusets allows you to do a soft-partition
of the system, and it would make sense to continue to have load
balancing within these partitions. I would think not having
load balancing should be one of the options available

> 
> Second, let me describe how this same issue shows up on the
> memory side.
> 

...snip...

> 
> 
> In the case of cpus, we really do prefer the partitions to be
> disjoint, because it would be better not to confuse the domain
> scheduler with overlapping domains.

Absolutely one of the problem I had was to map the flat disjoint
heirarchy of sched domains to the tree like heirarchy of cpusets

> 
> In the case of memory, we technically probably don't _have_ to
> keep the partitions disjoint.  I doubt that the page allocator
> (mm/page_alloc.c:__alloc_pages()) really cares.  It will strive
> valiantly to satisfy the memory request from any of the zones
> (each node specific) in the list passed into it.
> 
I must confess that I havent looked at the memory side all that much,
having more interest in trying to build soft-partitioning of the cpu's

> But for the purposes of providing a clear conceptual model to
> our users, I think it is best that we impose this constraint on
> the memory side as well as on the cpu si

Re: [RFC PATCH] Dynamic sched domains aka Isolated cpusets

2005-04-19 Thread Dinakar Guniguntala
On Tue, Apr 19, 2005 at 09:44:06AM +1000, Nick Piggin wrote:
> Very good, I was wondering when someone would try to implement this ;)

Thank you for the feedback !

> >-static void __devinit arch_init_sched_domains(void)
> >+static void attach_domains(cpumask_t cpu_map)
> > {
> 
> This shouldn't be needed. There should probably just be one place that
> attaches all domains. It is a bit difficult to explain what I mean when
> you have 2 such places below.
> 

Can you explain a bit more, not sure I understand what you mean

> Interface isn't bad. It would seem to be able to handle everything, but
> I think it can be made a bit simpler.
> 
>   fn_name(cpumask_t span1, cpumask_t span2)
> 
> Yeah? The change_map is implicitly the union of the 2 spans. Also I don't
> really like the name. It doesn't rebuild so much as split (or join). I
> can't think of anything good off the top of my head.

Yeah agreed. It kinda lived on from earlier versions I had

> 
> >+unsigned long flags;
> >+int i;
> >+
> >+local_irq_save(flags);
> >+
> >+for_each_cpu_mask(i, change_map)
> >+spin_lock(&cpu_rq(i)->lock);
> >+
> 
> Locking is wrong. And it has changed again in the latest -mm kernel.
> Please diff against that.
> 

I havent looked at the RCU sched domain changes as yet, but I put this in
to address some problems I noticed during stress testing.
Basically with the current hotplug code, it is possible to have a scenario
like this

 rebuild domains  load balance
|   |
| take existing sd pointer
|   |
 attach to dummy domain |
| loop thro sched groups
 change sched group info|
  access invalid pointer and panic


> >+if (!cpus_empty(span1))
> >+build_sched_domains(span1);
> >+if (!cpus_empty(span2))
> >+build_sched_domains(span2);
> >+
> 
> You also can't do this - you have to 'offline' the domains first before
> building new ones. See the CPU hotplug code that handles this.
> 

By offline if you mean attach to dummy domain, see above

> This makes a hotplug event destroy your nicely set up isolated domains,
> doesn't it?
> 
> This looks like the most difficult problem to overcome. It needs some
> external information to redo the cpuset splits at cpu hotplug time.
> Probably a hotplug handler in the cpusets code might be the best way
> to do that.

Yes I am aware of this. What I have in mind is for the hotplug code
from scheduler to call into cpusets code. This will just return say 1
when cpusets is not compiled in and the sched code can continue to do
what it is doing right now, else the cpusets code will find the leaf 
cpuset that contains the hotplugged cpu and rebuild the domains accordingly
However the question still remains as to how cpusets should handle 
this hotplugged cpu

-Dinakar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH] Dynamic sched domains aka Isolated cpusets

2005-04-18 Thread Dinakar Guniguntala

Here's an attempt at dynamic sched domains aka isolated cpusets

o  This functionality is on top of CPUSETs and provides a way to
   completely isolate any set of CPUs dynamically.
o  There is a new cpu_isolated flag that allows users to convert
   an exclusive cpuset to an isolated one
o  The isolated CPUs are part of their own sched domain.
   This ensures that the rebalance code works within the domain,
   prevents overhead due to a cpu trying to pull tasks only to find
   that its cpus_allowed mask does not allow it to be pulled.
   However it does not kick existing processes off the isolated domain
o  There is very little code change in the scheduler sched domain
   code. Most of it is just splitting up of the arch_init_sched_domains
   code to be called dynamically instead of only at boot time.
   It has only one API which takes in the map of all cpus affected
   and the two new domains to be built

   rebuild_sched_domains(cpumask_t change_map, cpumask_t span1, cpumask_t span2)


There are some things that may/will change

o  This has been tested only on x86 [8 way -> 4 way with HT]. Still
   needs work on other arch's
o  I didn't get a chance to see Nick Piggin's RCU sched domains code
   as yet, but I know there would be changes here because of that...
o  This does not support CPU hotplug as yet
o  Making a cpuset isolated manipulates its parent cpus_allowed
   mask. When viewed from userspace this is represented as follows

   [EMAIL PROTECTED] cpusets] cat cpus
   0-3[4-7]

   This indicates that CPUs 4-7 are isolated and is/are part of some
   child cpuset/s

Appreciate any feedback.

Patch against linux-2.6.12-rc1-mm1.

 include/linux/init.h  |2
 include/linux/sched.h |1
 kernel/cpuset.c   |  141 --
 kernel/sched.c|  109 +-
 4 files changed, 213 insertions(+), 40 deletions(-)


-Dinakar

diff -Naurp linux-2.6.12-rc1-mm1.orig/include/linux/init.h 
linux-2.6.12-rc1-mm1/include/linux/init.h
--- linux-2.6.12-rc1-mm1.orig/include/linux/init.h  2005-03-18 
07:03:49.0 +0530
+++ linux-2.6.12-rc1-mm1/include/linux/init.h   2005-04-18 00:48:26.0 
+0530
@@ -217,7 +217,7 @@ void __init parse_early_param(void);
 #define __initdata_or_module __initdata
 #endif /*CONFIG_MODULES*/
 
-#ifdef CONFIG_HOTPLUG
+#if defined(CONFIG_HOTPLUG) || defined(CONFIG_CPUSETS)
 #define __devinit
 #define __devinitdata
 #define __devexit
diff -Naurp linux-2.6.12-rc1-mm1.orig/include/linux/sched.h 
linux-2.6.12-rc1-mm1/include/linux/sched.h
--- linux-2.6.12-rc1-mm1.orig/include/linux/sched.h 2005-04-18 
00:46:40.0 +0530
+++ linux-2.6.12-rc1-mm1/include/linux/sched.h  2005-04-18 00:48:19.0 
+0530
@@ -155,6 +155,7 @@ typedef struct task_struct task_t;
 extern void sched_init(void);
 extern void sched_init_smp(void);
 extern void init_idle(task_t *idle, int cpu);
+extern void rebuild_sched_domains(cpumask_t change_map, cpumask_t span1, 
cpumask_t span2);
 
 extern cpumask_t nohz_cpu_mask;
 
diff -Naurp linux-2.6.12-rc1-mm1.orig/kernel/cpuset.c 
linux-2.6.12-rc1-mm1/kernel/cpuset.c
--- linux-2.6.12-rc1-mm1.orig/kernel/cpuset.c   2005-04-18 00:46:40.0 
+0530
+++ linux-2.6.12-rc1-mm1/kernel/cpuset.c2005-04-18 00:51:48.0 
+0530
@@ -55,9 +55,17 @@
 
 #define CPUSET_SUPER_MAGIC 0x27e0eb
 
+#define all_cpus(cs)   \
+({ \
+   cpumask_t __tmp_map;\
+   cpus_or(__tmp_map, cs->cpus_allowed, cs->isolated_map); \
+   __tmp_map;  \
+})
+
 struct cpuset {
unsigned long flags;/* "unsigned long" so bitops work */
cpumask_t cpus_allowed; /* CPUs allowed to tasks in cpuset */
+   cpumask_t isolated_map; /* CPUs associated with a sched domain 
*/
nodemask_t mems_allowed;/* Memory Nodes allowed to tasks */
 
atomic_t count; /* count tasks using this cpuset */
@@ -82,6 +90,7 @@ struct cpuset {
 /* bits in struct cpuset flags field */
 typedef enum {
CS_CPU_EXCLUSIVE,
+   CS_CPU_ISOLATED,
CS_MEM_EXCLUSIVE,
CS_REMOVED,
CS_NOTIFY_ON_RELEASE
@@ -93,6 +102,11 @@ static inline int is_cpu_exclusive(const
return !!test_bit(CS_CPU_EXCLUSIVE, &cs->flags);
 }
 
+static inline int is_cpu_isolated(const struct cpuset *cs)
+{
+   return !!test_bit(CS_CPU_ISOLATED, &cs->flags);
+}
+
 static inline int is_mem_exclusive(const struct cpuset *cs)
 {
return !!test_bit(CS_MEM_EXCLUSIVE, &cs->flags);
@@ -127,8 +141,9 @@ static inline int notify_on_release(cons
 static atomic_t cpuset_mems_generation = ATOMIC_INIT(1);
 
 static struct cpuset top_cpuset = {
-   .flags = ((1 << CS_CPU_EXCLUSIVE) | (1 << CS_MEM_EXCLUSIV

Re: [Lse-tech] [PATCH] cpusets - big numa cpu and memory placement

2005-02-08 Thread Dinakar Guniguntala
On Mon, Feb 07, 2005 at 03:59:49PM -0800, Matthew Dobson wrote:

> Sorry to reply a long quiet thread, but I've been trading emails with Paul 
> Jackson on this subject recently, and I've been unable to convince either 
> him or myself that merging CPUSETs and CKRM is as easy as I once believed.  
> I'm still convinced the CPU side is doable, but I haven't managed as much 
> success with the memory binding side of CPUSETs.  In light of this, I'd 
> like to remove my previous objections to CPUSETs moving forward.  If others 
> still have things they want discussed before CPUSETs moves into mainline, 
> that's fine, but it seems to me that CPUSETs offer legitimate functionality 
> and that the code has certainly "done its time" in -mm to convince me it's 
> stable and usable.
> 
> -Matt
> 

What about your proposed sched domain changes?
Cant sched domains be used handle the CPU groupings and the
existing code in cpusets that handle memory continue as is?
Weren't sched somains supposed to give the scheduler better knowledge
of the CPU groupings afterall ?

Regards,

Dinakar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/