Re: [PATCH 3/3] x86/intel_rdt/cqm: Improve limbo list processing

2017-08-14 Thread Shivappa Vikas



On Mon, 14 Aug 2017, Shivappa Vikas wrote:




On Mon, 14 Aug 2017, Thomas Gleixner wrote:


On Wed, 9 Aug 2017, Vikas Shivappa wrote:

@@ -426,6 +426,9 @@ static int domain_setup_mon_state(struct rdt_resource 
*r, struct rdt_domain *d)

   GFP_KERNEL);
if (!d->rmid_busy_llc)
return -ENOMEM;
+   INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
+   if (has_busy_rmid(r, d))
+   cqm_setup_limbo_handler(d);


This is beyond silly. d->rmid_busy_llc is allocated a few lines above. How
would a bit be set here?


If we logically offline all cpus in a package and bring it back, the worker 
needs to be scheduled on the package if there were busy RMIDs on this 
package. Otherwise that RMID never gets freed as its rmid->busy stays 1..


I needed to scan the limbo list and set the bits for all limbo RMIDs after 
the alloc and before doing the 'has_busy_rmid' check. Will fix


Tony pointed out that there is no guarentee that a domain will come back up once 
its down, so the above issue of rmid->busy staying at > 0 can still happen. 
So I will delete this -

if (has_busy_rmid(r, d))
cqm_setup_limbo_handler(d);

and add this when a domain is powered down -
for each rmid in d->rmid_busy_llc
if (--entry->busy)
free_rmid(rmid);

We have no way to know if the L3 was indeed flushed (or package was powered 
off). This may lead to incorrect counts in rare scenarios but can document the 
same.


Thanks,
vikas


Re: [PATCH 3/3] x86/intel_rdt/cqm: Improve limbo list processing

2017-08-14 Thread Shivappa Vikas



On Mon, 14 Aug 2017, Thomas Gleixner wrote:


On Wed, 9 Aug 2017, Vikas Shivappa wrote:


@@ -426,6 +426,9 @@ static int domain_setup_mon_state(struct rdt_resource *r, 
struct rdt_domain *d)
   GFP_KERNEL);
if (!d->rmid_busy_llc)
return -ENOMEM;
+   INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
+   if (has_busy_rmid(r, d))
+   cqm_setup_limbo_handler(d);


This is beyond silly. d->rmid_busy_llc is allocated a few lines above. How
would a bit be set here?


If we logically offline all cpus in a package and bring it back, the worker 
needs to be scheduled on the package if there were busy RMIDs on this package. 
Otherwise that RMID never gets freed as its rmid->busy stays 1..


I needed to scan the limbo list and set the bits for all limbo RMIDs after the 
alloc and before doing the 'has_busy_rmid' check. Will fix





}
if (is_mbm_total_enabled()) {
tsize = sizeof(*d->mbm_total);
@@ -536,11 +539,25 @@ static void domain_remove_cpu(int cpu, struct 
rdt_resource *r)
list_del(&d->list);
if (is_mbm_enabled())
cancel_delayed_work(&d->mbm_over);
+
+   if (is_llc_occupancy_enabled() &&
+   has_busy_rmid(r, d))


What is that line break helping here and why can't you just unconditionally
cancel the work?


Will fix the line break. The has_busy_rmid makes sure the worker was indeed 
scheduled - that way we cancel the worker which was actually scheduled..





+   cancel_delayed_work(&d->cqm_limbo);
+
kfree(d);
-   } else if (r == &rdt_resources_all[RDT_RESOURCE_L3] &&
-  cpu == d->mbm_work_cpu && is_mbm_enabled()) {
-   cancel_delayed_work(&d->mbm_over);
-   mbm_setup_overflow_handler(d);
+   return;
+   }
+
+   if (r == &rdt_resources_all[RDT_RESOURCE_L3]) {
+   if (is_mbm_enabled() && cpu == d->mbm_work_cpu) {
+   cancel_delayed_work(&d->mbm_over);
+   mbm_setup_overflow_handler(d);


I think this is the wrong approach. If the timer is about to fire you
essentially double the interval. So you better flush the work, which will
reschedule it if needed.


Ok will fix. We can flush(setup and run it immediately) the work here 
on the new cpu.





+   }
+   if (is_llc_occupancy_enabled() && cpu == d->mbm_work_cpu &&


That want's to be d->cbm_work_cpu, right?


Correct - thanks for pointing , will fix.




+   has_busy_rmid(r, d)) {
+   cancel_delayed_work(&d->cqm_limbo);
+   cqm_setup_limbo_handler(d);


See above.


For cqm 1s is not a hard requirement, but can flush the work like mbm to keep it 
uniform..


Thanks,
Vikas



Thanks,

tglx



Re: [PATCH 3/3] x86/intel_rdt/cqm: Improve limbo list processing

2017-08-14 Thread Thomas Gleixner
On Wed, 9 Aug 2017, Vikas Shivappa wrote:

> @@ -426,6 +426,9 @@ static int domain_setup_mon_state(struct rdt_resource *r, 
> struct rdt_domain *d)
>  GFP_KERNEL);
>   if (!d->rmid_busy_llc)
>   return -ENOMEM;
> + INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
> + if (has_busy_rmid(r, d))
> + cqm_setup_limbo_handler(d);

This is beyond silly. d->rmid_busy_llc is allocated a few lines above. How
would a bit be set here?

>   }
>   if (is_mbm_total_enabled()) {
>   tsize = sizeof(*d->mbm_total);
> @@ -536,11 +539,25 @@ static void domain_remove_cpu(int cpu, struct 
> rdt_resource *r)
>   list_del(&d->list);
>   if (is_mbm_enabled())
>   cancel_delayed_work(&d->mbm_over);
> +
> + if (is_llc_occupancy_enabled() &&
> + has_busy_rmid(r, d))

What is that line break helping here and why can't you just unconditionally
cancel the work?

> + cancel_delayed_work(&d->cqm_limbo);
> +
>   kfree(d);
> - } else if (r == &rdt_resources_all[RDT_RESOURCE_L3] &&
> -cpu == d->mbm_work_cpu && is_mbm_enabled()) {
> - cancel_delayed_work(&d->mbm_over);
> - mbm_setup_overflow_handler(d);
> + return;
> + }
> +
> + if (r == &rdt_resources_all[RDT_RESOURCE_L3]) {
> + if (is_mbm_enabled() && cpu == d->mbm_work_cpu) {
> + cancel_delayed_work(&d->mbm_over);
> + mbm_setup_overflow_handler(d);

I think this is the wrong approach. If the timer is about to fire you
essentially double the interval. So you better flush the work, which will
reschedule it if needed.

> + }
> + if (is_llc_occupancy_enabled() && cpu == d->mbm_work_cpu &&

That want's to be d->cbm_work_cpu, right?

> + has_busy_rmid(r, d)) {
> + cancel_delayed_work(&d->cqm_limbo);
> + cqm_setup_limbo_handler(d);

See above.

Thanks,

tglx


[PATCH 3/3] x86/intel_rdt/cqm: Improve limbo list processing

2017-08-09 Thread Vikas Shivappa
During a mkdir, we synchronously check the entire limbo list on each
package for free RMIDs by sending IPIs and the worst case would end up
taking more than tolerable amount of time in IPIs. For ex: On SKL we
could end up reading all 192 RMIDs for llc_occupancy.

Modify this to make some improvements mainly avoiding the use of IPIs.
We use asynchronous worker threads on each package which would
periodically scan the limbo list and move the RMIDs that have:

llc_occupancy < threshold_occupancy

on all packages to free list. The mkdir would simply return -ENOSPC if the
limbo list is empty or would return -EBUSY if there are RMIDs on limbo
but not on free list.  As a side effect of not accessing the limbo list
in parallel on IPIs we simplify some of the related data structures.

Signed-off-by: Vikas Shivappa 
---
 arch/x86/kernel/cpu/intel_rdt.c |  25 +++-
 arch/x86/kernel/cpu/intel_rdt.h |  10 ++
 arch/x86/kernel/cpu/intel_rdt_monitor.c | 209 ++--
 3 files changed, 120 insertions(+), 124 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 97c8d83..792f142 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -426,6 +426,9 @@ static int domain_setup_mon_state(struct rdt_resource *r, 
struct rdt_domain *d)
   GFP_KERNEL);
if (!d->rmid_busy_llc)
return -ENOMEM;
+   INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
+   if (has_busy_rmid(r, d))
+   cqm_setup_limbo_handler(d);
}
if (is_mbm_total_enabled()) {
tsize = sizeof(*d->mbm_total);
@@ -536,11 +539,25 @@ static void domain_remove_cpu(int cpu, struct 
rdt_resource *r)
list_del(&d->list);
if (is_mbm_enabled())
cancel_delayed_work(&d->mbm_over);
+
+   if (is_llc_occupancy_enabled() &&
+   has_busy_rmid(r, d))
+   cancel_delayed_work(&d->cqm_limbo);
+
kfree(d);
-   } else if (r == &rdt_resources_all[RDT_RESOURCE_L3] &&
-  cpu == d->mbm_work_cpu && is_mbm_enabled()) {
-   cancel_delayed_work(&d->mbm_over);
-   mbm_setup_overflow_handler(d);
+   return;
+   }
+
+   if (r == &rdt_resources_all[RDT_RESOURCE_L3]) {
+   if (is_mbm_enabled() && cpu == d->mbm_work_cpu) {
+   cancel_delayed_work(&d->mbm_over);
+   mbm_setup_overflow_handler(d);
+   }
+   if (is_llc_occupancy_enabled() && cpu == d->mbm_work_cpu &&
+   has_busy_rmid(r, d)) {
+   cancel_delayed_work(&d->cqm_limbo);
+   cqm_setup_limbo_handler(d);
+   }
}
 }
 
diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 4040bf1..6d57d7e 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -20,6 +20,8 @@
 #define QOS_L3_MBM_TOTAL_EVENT_ID  0x02
 #define QOS_L3_MBM_LOCAL_EVENT_ID  0x03
 
+#define CQM_LIMBOCHECK_INTERVAL1000
+
 #define MBM_CNTR_WIDTH 24
 #define MBM_OVERFLOW_INTERVAL  1000
 
@@ -187,8 +189,11 @@ struct mbm_state {
  * @mbm_total: saved state for MBM total bandwidth
  * @mbm_local: saved state for MBM local bandwidth
  * @mbm_over:  worker to periodically read MBM h/w counters
+ * @cqm_limbo: worker to periodically read CQM h/w counters
  * @mbm_work_cpu:
  * worker cpu for MBM h/w counters
+ * @cqm_work_cpu:
+ * worker cpu for CQM h/w counters
  * @ctrl_val:  array of cache or mem ctrl values (indexed by CLOSID)
  * @new_ctrl:  new ctrl value to be loaded
  * @have_new_ctrl: did user provide new_ctrl for this domain
@@ -201,7 +206,9 @@ struct rdt_domain {
struct mbm_state*mbm_total;
struct mbm_state*mbm_local;
struct delayed_work mbm_over;
+   struct delayed_work cqm_limbo;
int mbm_work_cpu;
+   int cqm_work_cpu;
u32 *ctrl_val;
u32 new_ctrl;
boolhave_new_ctrl;
@@ -424,5 +431,8 @@ void mon_event_read(struct rmid_read *rr, struct rdt_domain 
*d,
struct rdtgroup *rdtgrp, int evtid, int first);
 void mbm_setup_overflow_handler(struct rdt_domain *dom);
 void mbm_handle_overflow(struct work_struct *work);
+void cqm_setup_limbo_handler(struct rdt_domain *dom);
+void cqm_handle_limbo(struct work_struct *work);
+bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
 
 #endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt_monitor.c 
b/arch/x86/kernel/cpu/intel_rdt_monitor.c
index d6bfdfd..d9c03a0 100644
--- a/arch/x86/kernel/cpu/intel