Re: [Xen-devel] [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

2017-11-20 Thread Daniel Lezcano
On 20/11/2017 08:05, Quan Xu wrote:

[ ... ]

 But the irq_timings stuff is heading into the same direction, with a
 more
 complex prediction logic which should tell you pretty good how long
 that
 idle period is going to be and in case of an interrupt heavy workload
 this
 would skip the extra work of stopping and restarting the tick and
 provide a
 very good input into a polling decision.
>>>
>>> interesting. I have tested with IRQ_TIMINGS related code, which seems
>>> not working so far.
>> I don't know how you tested it, can you elaborate what you meant by
>> "seems not working so far" ?
> 
> Daniel, I tried to enable IRQ_TIMINGS* manually. used
> irq_timings_next_event()
> to return estimation of the earliest interrupt. However I got a constant.

The irq timings gives you an indication of the next interrupt deadline.

This information is a piece of the puzzle, you need to combine it with
the next timer expiration, and the next scheduling event. Then take the
earliest event in a timeline basis.

Using the trivial scheme above will work well with workload like videos
or mp3 but will fail as soon as the interrupts are not coming in a
regular basis and this is where the pattern recognition algorithm must act.

>> There are still some work to do to be more efficient. The prediction
>> based on the irq timings is all right if the interrupts have a simple
>> periodicity. But as soon as there is a pattern, the current code can't
>> handle it properly and does bad predictions.
>>
>> I'm working on a self-learning pattern detection which is too heavy for
>> the kernel, and with it we should be able to detect properly the
>> patterns and re-ajust the period if it changes. I'm in the process of
>> making it suitable for kernel code (both math and perf).
>>
>> One improvement which can be done right now and which can help you is
>> the interrupts rate on the CPU. It is possible to compute it and that
>> will give an accurate information for the polling decision.
>>
>>
> As tglx said, talk to each other / work together to make it usable for
> all use cases.
> could you share how to enable it to get the interrupts rate on the CPU?
> I can try it
> in cloud scenario. of course, I'd like to work with you to improve it.

Sure, I will be glad if we can collaborate. I have some draft code but
before sharing it I would like we define what is the rate and what kind
of information we expect to infer from it. From my point of view it is a
value indicating the interrupt period per CPU, a short value indicates a
high number of interrupts on the CPU.

This value must decay with the time, the question here is what decay
function we apply to the rate from the last timestamp ?




-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

2017-11-16 Thread Daniel Lezcano
On 16/11/2017 10:12, Quan Xu wrote:
> 
> 
> On 2017-11-16 06:03, Thomas Gleixner wrote:
>> On Wed, 15 Nov 2017, Peter Zijlstra wrote:
>>
>>> On Mon, Nov 13, 2017 at 06:06:02PM +0800, Quan Xu wrote:
 From: Yang Zhang 

 Implement a generic idle poll which resembles the functionality
 found in arch/. Provide weak arch_cpu_idle_poll function which
 can be overridden by the architecture code if needed.
>>> No, we want less of those magic hooks, not more.
>>>
 Interrupts arrive which may not cause a reschedule in idle loops.
 In KVM guest, this costs several VM-exit/VM-entry cycles, VM-entry
 for interrupts and VM-exit immediately. Also this becomes more
 expensive than bare metal. Add a generic idle poll before enter
 real idle path. When a reschedule event is pending, we can bypass
 the real idle path.
>>> Why not do a HV specific idle driver?
>> If I understand the problem correctly then he wants to avoid the heavy
>> lifting in tick_nohz_idle_enter() in the first place, but there is
>> already
>> an interesting quirk there which makes it exit early.  See commit
>> 3c5d92a0cfb5 ("nohz: Introduce arch_needs_cpu"). The reason for this
>> commit
>> looks similar. But lets not proliferate that. I'd rather see that go
>> away.
> 
> agreed.
> 
> Even we can get more benifit than commit 3c5d92a0cfb5 ("nohz: Introduce
> arch_needs_cpu")
> in kvm guest. I won't proliferate that..
> 
>> But the irq_timings stuff is heading into the same direction, with a more
>> complex prediction logic which should tell you pretty good how long that
>> idle period is going to be and in case of an interrupt heavy workload
>> this
>> would skip the extra work of stopping and restarting the tick and
>> provide a
>> very good input into a polling decision.
> 
> 
> interesting. I have tested with IRQ_TIMINGS related code, which seems
> not working so far.

I don't know how you tested it, can you elaborate what you meant by
"seems not working so far" ?

There are still some work to do to be more efficient. The prediction
based on the irq timings is all right if the interrupts have a simple
periodicity. But as soon as there is a pattern, the current code can't
handle it properly and does bad predictions.

I'm working on a self-learning pattern detection which is too heavy for
the kernel, and with it we should be able to detect properly the
patterns and re-ajust the period if it changes. I'm in the process of
making it suitable for kernel code (both math and perf).

One improvement which can be done right now and which can help you is
the interrupts rate on the CPU. It is possible to compute it and that
will give an accurate information for the polling decision.



-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V8 1/3] irq: Add flags to request_percpu_irq function

2017-03-23 Thread Daniel Lezcano
Hi Mark,

On Thu, Mar 23, 2017 at 06:54:52PM +, Mark Rutland wrote:
> Hi Daniel,
> 
> On Thu, Mar 23, 2017 at 06:42:01PM +0100, Daniel Lezcano wrote:
> > In the next changes, we track the interrupts but we discard the timers as
> > that does not make sense. The next interrupt on a timer is predictable.
> 
> Sorry, but I could not parse this. 

I meant we are measuring when are happening the interrupts by getting the local
clock in the interrupt handler. But if the interrupts are coming from a timer, 
it
is not necessary to do that because we already know when they will occur.

So, in order to sort out which interrupt we measure, we use the IRQF_TIMER flag.

Unfortunately, this flag is missing when we do a request_percpu_irq. The
purpose of this patch is to fix that.

> > diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> > index 9612b84..0f5ab4a 100644
> > --- a/drivers/perf/arm_pmu.c
> > +++ b/drivers/perf/arm_pmu.c
> > @@ -661,7 +661,7 @@ static int cpu_pmu_request_irq(struct arm_pmu *cpu_pmu, 
> > irq_handler_t handler)
> >  
> > irq = platform_get_irq(pmu_device, 0);
> > if (irq > 0 && irq_is_percpu(irq)) {
> > -   err = request_percpu_irq(irq, handler, "arm-pmu",
> > +   err = request_percpu_irq(irq, 0, handler, "arm-pmu",
> >  _events->percpu_pmu);
> > if (err) {
> > pr_err("unable to request IRQ%d for ARM PMU counters\n",
> 
> Please Cc myself and Will Deacon when modifying the arm_pmu driver, as
> per MAINTAINERS. I only spotted this patch by chance.

Ah, ok, sorry for that. Thanks for spotting this, you should have been Cc'ed by
my cccmd script. I will check that.

> This conflicts with arm_pmu changes I have queued for v4.12 [1].
> 
> So, can we leave the prototype of request_percpu_irq() as-is?
> 
> Why not add a new request_percpu_irq_flags() function, and leave
> request_percpu_irq() as a wrapper for that? e.g.

[ ... ]

> static inline int
> request_percpu_irq(unsigned int irq, irq_handler_t handler,
>  const char *devname, void __percpu *percpu_dev_id)
> {
>   return request_percpu_irq_flags(irq, handler, devname,
>   percpu_dev_id, 0);
> }
> 
> ... that would avoid having to touch any non-timer driver for now.

Mmh, yes. That's a good suggestion.

> [...]
> 
> > -request_percpu_irq(unsigned int irq, irq_handler_t handler,
> > -  const char *devname, void __percpu *percpu_dev_id);
> > +request_percpu_irq(unsigned int irq, unsigned long flags,
> > +  irq_handler_t handler,  const char *devname,
> > +  void __percpu *percpu_dev_id);
> >  
> 
> Looking at request_irq, the prototype is:
> 
> int __must_check
> request_irq(unsigned int irq, irq_handler_t handler,
>   unsigned long flags, const char *name,
>   void *dev);
> 
> ... surely it would be better to share the same argument order? i.e.
> 
> int __must_check
> request_percpu_irq(unsigned int irq, irq_handler_t handler,
>  unsigned long flags, const char *devname,
>  void __percpu *percpu_dev_id);
> 

Agree.

Thanks for the review.

  -- Daniel


-- 

 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH V8 1/3] irq: Add flags to request_percpu_irq function

2017-03-23 Thread Daniel Lezcano
In the next changes, we track the interrupts but we discard the timers as
that does not make sense. The next interrupt on a timer is predictable.

But, the API request_percpu_irq does not allow to pass a flag, hence specifying
if the interrupt type is a timer.

Solve this by passing a 'flags' parameter to the function and change all the
callers to pass IRQF_TIMER when the interrupt is a timer interrupt, zero
otherwise.

For now, in order to prevent a misusage of this parameter, only the IRQF_TIMER
flag is a valid parameter to be passed to the request_percpu_irq function.

Signed-off-by: Daniel Lezcano <daniel.lezc...@linaro.org>
---
 arch/arc/kernel/perf_event.c |  2 +-
 arch/arc/kernel/smp.c|  2 +-
 arch/arm/kernel/smp_twd.c|  3 ++-
 arch/arm/xen/enlighten.c |  2 +-
 drivers/clocksource/arc_timer.c  |  2 +-
 drivers/clocksource/arm_arch_timer.c | 15 +--
 drivers/clocksource/arm_global_timer.c   |  2 +-
 drivers/clocksource/exynos_mct.c |  2 +-
 drivers/clocksource/qcom-timer.c |  2 +-
 drivers/clocksource/time-armada-370-xp.c |  2 +-
 drivers/clocksource/timer-nps.c  |  2 +-
 drivers/net/ethernet/marvell/mvneta.c|  2 +-
 drivers/perf/arm_pmu.c   |  2 +-
 include/linux/interrupt.h|  5 +++--
 kernel/irq/manage.c  | 11 ---
 virt/kvm/arm/arch_timer.c|  5 +++--
 virt/kvm/arm/vgic/vgic-init.c|  2 +-
 17 files changed, 37 insertions(+), 26 deletions(-)

diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 2ce24e7..2a90c7a 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -525,7 +525,7 @@ static int arc_pmu_device_probe(struct platform_device 
*pdev)
arc_pmu->irq = irq;
 
/* intc map function ensures irq_set_percpu_devid() called */
-   request_percpu_irq(irq, arc_pmu_intr, "ARC perf counters",
+   request_percpu_irq(irq, 0, arc_pmu_intr, "ARC perf counters",
   this_cpu_ptr(_pmu_cpu));
 
on_each_cpu(arc_cpu_pmu_irq_init, , 1);
diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
index f462671..5cdd3c9 100644
--- a/arch/arc/kernel/smp.c
+++ b/arch/arc/kernel/smp.c
@@ -381,7 +381,7 @@ int smp_ipi_irq_setup(int cpu, irq_hw_number_t hwirq)
if (!cpu) {
int rc;
 
-   rc = request_percpu_irq(virq, do_IPI, "IPI Interrupt", dev);
+   rc = request_percpu_irq(virq, 0, do_IPI, "IPI Interrupt", dev);
if (rc)
panic("Percpu IRQ request failed for %u\n", virq);
}
diff --git a/arch/arm/kernel/smp_twd.c b/arch/arm/kernel/smp_twd.c
index 895ae51..988f9b9 100644
--- a/arch/arm/kernel/smp_twd.c
+++ b/arch/arm/kernel/smp_twd.c
@@ -332,7 +332,8 @@ static int __init twd_local_timer_common_register(struct 
device_node *np)
goto out_free;
}
 
-   err = request_percpu_irq(twd_ppi, twd_handler, "twd", twd_evt);
+   err = request_percpu_irq(twd_ppi, IRQF_TIMER, twd_handler, "twd",
+twd_evt);
if (err) {
pr_err("twd: can't register interrupt %d (%d)\n", twd_ppi, err);
goto out_free;
diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 81e3217..2897f94 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -400,7 +400,7 @@ static int __init xen_guest_init(void)
 
xen_init_IRQ();
 
-   if (request_percpu_irq(xen_events_irq, xen_arm_callback,
+   if (request_percpu_irq(xen_events_irq, 0, xen_arm_callback,
   "events", _vcpu)) {
pr_err("Error request IRQ %d\n", xen_events_irq);
return -EINVAL;
diff --git a/drivers/clocksource/arc_timer.c b/drivers/clocksource/arc_timer.c
index 7517f95..e78e306 100644
--- a/drivers/clocksource/arc_timer.c
+++ b/drivers/clocksource/arc_timer.c
@@ -301,7 +301,7 @@ static int __init arc_clockevent_setup(struct device_node 
*node)
}
 
/* Needs apriori irq_set_percpu_devid() done in intc map function */
-   ret = request_percpu_irq(arc_timer_irq, timer_irq_handler,
+   ret = request_percpu_irq(arc_timer_irq, IRQF_TIMER, timer_irq_handler,
 "Timer0 (per-cpu-tick)", evt);
if (ret) {
pr_err("clockevent: unable to request irq\n");
diff --git a/drivers/clocksource/arm_arch_timer.c 
b/drivers/clocksource/arm_arch_timer.c
index 7a8a411..11398ff 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -768,16 +768,19 @@ static int __init arch_timer_register(void)
ppi = arch_timer_ppi