Re: [PATCH v3 3/5] powerpc/numa: Use cpu node map of first sibling thread

2019-09-12 Thread Srikar Dronamraju
* Nathan Lynch  [2019-09-12 13:15:03]:

> Srikar Dronamraju  writes:
> 
> >> 
> >> I think just WARN_ON(cpu_online(fcpu)) would be satisfactory. In my
> >> experience, the downstream effects of violating this condition are
> >> varied and quite difficult to debug. Seems only appropriate to emit a
> >> warning and stack trace before the OS inevitably becomes unstable.
> >
> > I still have to try but wouldn't this be a problem for the boot-cpu?
> > I mean boot-cpu would be marked online while it tries to do numa_setup_cpu.
> > No?
> 
> This is what I mean:
> 
>  +  if (fcpu != lcpu) {
>  +  WARN_ON(cpu_online(fcpu));
>  +  map_cpu_to_node(fcpu, nid);
>  +  }
> 

Yes this should work. Will send an updated patch with this change.

> I.e. if we're modifying the mapping for a remote cpu, warn if it's
> online.
> 
> I don't think this would warn on the boot cpu -- I would expect fcpu and
> lcpu to be the same and this branch would not be taken.

-- 
Thanks and Regards
Srikar Dronamraju



Re: [PATCH v3 3/5] powerpc/numa: Use cpu node map of first sibling thread

2019-09-12 Thread Nathan Lynch
Srikar Dronamraju  writes:

>> 
>> I think just WARN_ON(cpu_online(fcpu)) would be satisfactory. In my
>> experience, the downstream effects of violating this condition are
>> varied and quite difficult to debug. Seems only appropriate to emit a
>> warning and stack trace before the OS inevitably becomes unstable.
>
> I still have to try but wouldn't this be a problem for the boot-cpu?
> I mean boot-cpu would be marked online while it tries to do numa_setup_cpu.
> No?

This is what I mean:

 +  if (fcpu != lcpu) {
 +  WARN_ON(cpu_online(fcpu));
 +  map_cpu_to_node(fcpu, nid);
 +  }

I.e. if we're modifying the mapping for a remote cpu, warn if it's
online.

I don't think this would warn on the boot cpu -- I would expect fcpu and
lcpu to be the same and this branch would not be taken.


Re: [PATCH v3 3/5] powerpc/numa: Use cpu node map of first sibling thread

2019-09-12 Thread Srikar Dronamraju
> 
> I think just WARN_ON(cpu_online(fcpu)) would be satisfactory. In my
> experience, the downstream effects of violating this condition are
> varied and quite difficult to debug. Seems only appropriate to emit a
> warning and stack trace before the OS inevitably becomes unstable.

I still have to try but wouldn't this be a problem for the boot-cpu?
I mean boot-cpu would be marked online while it tries to do numa_setup_cpu.
No?

-- 
Thanks and Regards
Srikar Dronamraju



Re: [PATCH v3 3/5] powerpc/numa: Use cpu node map of first sibling thread

2019-09-12 Thread Nathan Lynch
Hi Srikar,

Srikar Dronamraju  writes:
>> > @@ -496,6 +501,16 @@ static int numa_setup_cpu(unsigned long lcpu)
>> >if (nid < 0 || !node_possible(nid))
>> >nid = first_online_node;
>> >  
>> > +  /*
>> > +   * Update for the first thread of the core. All threads of a core
>> > +   * have to be part of the same node. This not only avoids querying
>> > +   * for every other thread in the core, but always avoids a case
>> > +   * where virtual node associativity change causes subsequent threads
>> > +   * of a core to be associated with different nid.
>> > +   */
>> > +  if (fcpu != lcpu)
>> > +  map_cpu_to_node(fcpu, nid);
>> > +
>> 
>> OK, I see that this somewhat addresses my concern above. But changing
>> this mapping for a remote cpu is unsafe except under specific
>> circumstances. I think this should first assert:
>> 
>> * numa_cpu_lookup_table[fcpu] == NUMA_NO_NODE
>> * cpu_online(fcpu) == false
>> 
>> to document and enforce the conditions that must hold for this to be OK.
>
> I do understand that we shouldn't be modifying the nid for a different cpu.
>
> We just checked above that the mapping for the first cpu doesnt exist.
> If the first cpu (or remote cpu as you coin it) was online, then its
> mapping should have existed and we return even before we come here.

I agree that is how the code will work with your change, and I'm fine
with simply warning if fcpu is offline.

The point is to make this rule more explicit in the code for the benefit
of future readers and to catch violations of it by future changes. There
is a fair amount of code remaining in this file and elsewhere in
arch/powerpc that was written under the impression that changing the
cpu-node relationship at runtime is OK.


> nid = numa_cpu_lookup_table[fcpu];
> if (nid >= 0) {
>   map_cpu_to_node(lcpu, nid);
>   return nid;
> }
>
> Currently numa_setup_cpus is only called at very early boot and in cpu
> hotplug. At hotplug time, the oneline of cpus is serialized. Right? Do we 
> see a chance of remote cpu changing its state as we set its nid here?
>
> Also lets say if we assert and for some unknown reason the assertion fails.
> How do we handle the failure case?  We cant get out without setting
> the nid. We cant continue setting the nid. Should we panic the system given
> that the check a few lines above is now turning out to be false? Probably
> no, as I think we can live with it.
>
> Any thoughts?

I think just WARN_ON(cpu_online(fcpu)) would be satisfactory. In my
experience, the downstream effects of violating this condition are
varied and quite difficult to debug. Seems only appropriate to emit a
warning and stack trace before the OS inevitably becomes unstable.


Re: [PATCH v3 3/5] powerpc/numa: Use cpu node map of first sibling thread

2019-09-11 Thread Srikar Dronamraju
Hi Nathan, 

Thanks for your reviews.

> > -   if ((nid = numa_cpu_lookup_table[lcpu]) >= 0) {
> > +   nid = numa_cpu_lookup_table[fcpu];
> > +   if (nid >= 0) {
> > map_cpu_to_node(lcpu, nid);
> > return nid;
> > }
> 
> Yes, we need to something like this to prevent a VPHN change that occurs
> concurrently with onlining a core's threads from messing us up.
> 
> Is it a good assumption that the first thread of a sibling group will
> have its mapping initialized first? I think the answer is yes for boot,
> but hotplug... not so sure.
> 
> 
> > @@ -496,6 +501,16 @@ static int numa_setup_cpu(unsigned long lcpu)
> > if (nid < 0 || !node_possible(nid))
> > nid = first_online_node;
> >  
> > +   /*
> > +* Update for the first thread of the core. All threads of a core
> > +* have to be part of the same node. This not only avoids querying
> > +* for every other thread in the core, but always avoids a case
> > +* where virtual node associativity change causes subsequent threads
> > +* of a core to be associated with different nid.
> > +*/
> > +   if (fcpu != lcpu)
> > +   map_cpu_to_node(fcpu, nid);
> > +
> 
> OK, I see that this somewhat addresses my concern above. But changing
> this mapping for a remote cpu is unsafe except under specific
> circumstances. I think this should first assert:
> 
> * numa_cpu_lookup_table[fcpu] == NUMA_NO_NODE
> * cpu_online(fcpu) == false
> 
> to document and enforce the conditions that must hold for this to be OK.

I do understand that we shouldn't be modifying the nid for a different cpu.

We just checked above that the mapping for the first cpu doesnt exist.
If the first cpu (or remote cpu as you coin it) was online, then its
mapping should have existed and we return even before we come here.

nid = numa_cpu_lookup_table[fcpu];
if (nid >= 0) {
map_cpu_to_node(lcpu, nid);
return nid;
}

Currently numa_setup_cpus is only called at very early boot and in cpu
hotplug. At hotplug time, the oneline of cpus is serialized. Right? Do we 
see a chance of remote cpu changing its state as we set its nid here?

Also lets say if we assert and for some unknown reason the assertion fails.
How do we handle the failure case?  We cant get out without setting
the nid. We cant continue setting the nid. Should we panic the system given
that the check a few lines above is now turning out to be false? Probably
no, as I think we can live with it.

Any thoughts?

-- 
Thanks and Regards
Srikar Dronamraju



Re: [PATCH v3 3/5] powerpc/numa: Use cpu node map of first sibling thread

2019-09-11 Thread Nathan Lynch
Hi Srikar,

Srikar Dronamraju  writes:
> @@ -467,15 +467,20 @@ static int of_drconf_to_nid_single(struct drmem_lmb 
> *lmb)
>   */
>  static int numa_setup_cpu(unsigned long lcpu)
>  {
> - int nid = NUMA_NO_NODE;
>   struct device_node *cpu;
> + int fcpu = cpu_first_thread_sibling(lcpu);
> + int nid = NUMA_NO_NODE;
>  
>   /*
>* If a valid cpu-to-node mapping is already available, use it
>* directly instead of querying the firmware, since it represents
>* the most recent mapping notified to us by the platform (eg: VPHN).
> +  * Since cpu_to_node binding remains the same for all threads in the
> +  * core. If a valid cpu-to-node mapping is already available, for
> +  * the first thread in the core, use it.
>*/
> - if ((nid = numa_cpu_lookup_table[lcpu]) >= 0) {
> + nid = numa_cpu_lookup_table[fcpu];
> + if (nid >= 0) {
>   map_cpu_to_node(lcpu, nid);
>   return nid;
>   }

Yes, we need to something like this to prevent a VPHN change that occurs
concurrently with onlining a core's threads from messing us up.

Is it a good assumption that the first thread of a sibling group will
have its mapping initialized first? I think the answer is yes for boot,
but hotplug... not so sure.


> @@ -496,6 +501,16 @@ static int numa_setup_cpu(unsigned long lcpu)
>   if (nid < 0 || !node_possible(nid))
>   nid = first_online_node;
>  
> + /*
> +  * Update for the first thread of the core. All threads of a core
> +  * have to be part of the same node. This not only avoids querying
> +  * for every other thread in the core, but always avoids a case
> +  * where virtual node associativity change causes subsequent threads
> +  * of a core to be associated with different nid.
> +  */
> + if (fcpu != lcpu)
> + map_cpu_to_node(fcpu, nid);
> +

OK, I see that this somewhat addresses my concern above. But changing
this mapping for a remote cpu is unsafe except under specific
circumstances. I think this should first assert:

* numa_cpu_lookup_table[fcpu] == NUMA_NO_NODE
* cpu_online(fcpu) == false

to document and enforce the conditions that must hold for this to be OK.


[PATCH v3 3/5] powerpc/numa: Use cpu node map of first sibling thread

2019-09-06 Thread Srikar Dronamraju
All the sibling threads of a core have to be part of the same node.
To ensure that all the sibling threads map to the same node, always
lookup/update the cpu-to-node map of the first thread in the core.

Signed-off-by: Srikar Dronamraju 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Nathan Lynch 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Satheesh Rajendran 
Reported-by: Abdul Haleem 
---
 arch/powerpc/mm/numa.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 8fbe57c..d0af9a2 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -467,15 +467,20 @@ static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
  */
 static int numa_setup_cpu(unsigned long lcpu)
 {
-   int nid = NUMA_NO_NODE;
struct device_node *cpu;
+   int fcpu = cpu_first_thread_sibling(lcpu);
+   int nid = NUMA_NO_NODE;
 
/*
 * If a valid cpu-to-node mapping is already available, use it
 * directly instead of querying the firmware, since it represents
 * the most recent mapping notified to us by the platform (eg: VPHN).
+* Since cpu_to_node binding remains the same for all threads in the
+* core. If a valid cpu-to-node mapping is already available, for
+* the first thread in the core, use it.
 */
-   if ((nid = numa_cpu_lookup_table[lcpu]) >= 0) {
+   nid = numa_cpu_lookup_table[fcpu];
+   if (nid >= 0) {
map_cpu_to_node(lcpu, nid);
return nid;
}
@@ -496,6 +501,16 @@ static int numa_setup_cpu(unsigned long lcpu)
if (nid < 0 || !node_possible(nid))
nid = first_online_node;
 
+   /*
+* Update for the first thread of the core. All threads of a core
+* have to be part of the same node. This not only avoids querying
+* for every other thread in the core, but always avoids a case
+* where virtual node associativity change causes subsequent threads
+* of a core to be associated with different nid.
+*/
+   if (fcpu != lcpu)
+   map_cpu_to_node(fcpu, nid);
+
map_cpu_to_node(lcpu, nid);
of_node_put(cpu);
 out:
-- 
1.8.3.1