RE: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-23 Thread Long Li
>Thanks for the clarification. > >The problem with what Ming is proposing in my mind (and its an existing >problem that exists today), is that nvme is taking precedence over anything >else until it absolutely cannot hog the cpu in hardirq. > >In the thread Ming referenced a case where today if the

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-20 Thread Sagi Grimberg
Sagi, Sorry it took a while to bring my system back online. With the patch, the IOPS is about the same drop with the 1st patch. I think the excessive context switches are causing the drop in IOPS. The following are captured by "perf sched record" for 30 seconds during tests. "perf

RE: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-20 Thread Long Li
> >> Long, does this patch make any difference? > > > > Sagi, > > > > Sorry it took a while to bring my system back online. > > > > With the patch, the IOPS is about the same drop with the 1st patch. I think > the excessive context switches are causing the drop in IOPS. > > > > The following are

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-20 Thread Sagi Grimberg
Hey Ming, Ok, so the real problem is per-cpu bounded tasks. I share Thomas opinion about a NAPI like approach. We already have that, its irq_poll, but it seems that for this use-case, we get lower performance for some reason. I'm not entirely sure why that is, maybe its because we need to

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-20 Thread Sagi Grimberg
It seems like we're attempting to stay in irq context for as long as we can instead of scheduling to softirq/thread context if we have more than a minimal amount of work to do. Without at least understanding why softirq/thread degrades us so much this code seems like the wrong approach to me.

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-18 Thread Ming Lei
On Mon, Sep 09, 2019 at 08:10:07PM -0700, Sagi Grimberg wrote: > Hey Ming, > > > > > Ok, so the real problem is per-cpu bounded tasks. > > > > > > > > I share Thomas opinion about a NAPI like approach. > > > > > > We already have that, its irq_poll, but it seems that for this > > > use-case, we

RE: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-17 Thread Long Li
>Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism > >Hey Ming, > >>>> Ok, so the real problem is per-cpu bounded tasks. >>>> >>>> I share Thomas opinion about a NAPI like approach. >>> >>> We already have t

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-09 Thread Sagi Grimberg
Hey Ming, Ok, so the real problem is per-cpu bounded tasks. I share Thomas opinion about a NAPI like approach. We already have that, its irq_poll, but it seems that for this use-case, we get lower performance for some reason. I'm not entirely sure why that is, maybe its because we need to

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-09 Thread Ming Lei
On Sat, Sep 07, 2019 at 06:19:20AM +0800, Ming Lei wrote: > On Fri, Sep 06, 2019 at 05:50:49PM +, Long Li wrote: > > >Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism > > > > > >On Fri, Sep 06, 2019 at 09:48:21AM +0800, Ming Lei wrote: >

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-06 Thread Ming Lei
On Fri, Sep 06, 2019 at 11:30:57AM -0700, Sagi Grimberg wrote: > > > > > Ok, so the real problem is per-cpu bounded tasks. > > > > I share Thomas opinion about a NAPI like approach. > > We already have that, its irq_poll, but it seems that for this > use-case, we get lower performance for some

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-06 Thread Ming Lei
On Fri, Sep 06, 2019 at 04:25:55PM -0600, Keith Busch wrote: > On Sat, Sep 07, 2019 at 06:19:21AM +0800, Ming Lei wrote: > > On Fri, Sep 06, 2019 at 05:50:49PM +, Long Li wrote: > > > >Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism > > >

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-06 Thread Keith Busch
On Sat, Sep 07, 2019 at 06:19:21AM +0800, Ming Lei wrote: > On Fri, Sep 06, 2019 at 05:50:49PM +, Long Li wrote: > > >Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism > > > > > >Why are all 8 nvmes sharing the same CPU for inte

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-06 Thread Ming Lei
On Fri, Sep 06, 2019 at 05:50:49PM +, Long Li wrote: > >Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism > > > >On Fri, Sep 06, 2019 at 09:48:21AM +0800, Ming Lei wrote: > >> When one IRQ flood happens on one CPU: > >> > >&g

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-06 Thread Keith Busch
On Fri, Sep 06, 2019 at 11:30:57AM -0700, Sagi Grimberg wrote: > > > > > Ok, so the real problem is per-cpu bounded tasks. > > > > I share Thomas opinion about a NAPI like approach. > > We already have that, its irq_poll, but it seems that for this > use-case, we get lower performance for some

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-06 Thread Sagi Grimberg
Ok, so the real problem is per-cpu bounded tasks. I share Thomas opinion about a NAPI like approach. We already have that, its irq_poll, but it seems that for this use-case, we get lower performance for some reason. I'm not entirely sure why that is, maybe its because we need to mask

RE: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-06 Thread Long Li
>Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism > >On Fri, Sep 06, 2019 at 09:48:21AM +0800, Ming Lei wrote: >> When one IRQ flood happens on one CPU: >> >> 1) softirq handling on this CPU can't make progress >> >> 2) kernel threa

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-06 Thread Keith Busch
On Fri, Sep 06, 2019 at 09:48:21AM +0800, Ming Lei wrote: > When one IRQ flood happens on one CPU: > > 1) softirq handling on this CPU can't make progress > > 2) kernel thread bound to this CPU can't make progress > > For example, network may require softirq to xmit packets, or another irq >

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-05 Thread Daniel Lezcano
Hi, On 06/09/2019 03:48, Ming Lei wrote: [ ... ] >> You did not share yet the analysis of the problem (the kernel warnings >> give the symptoms) and gave the reasoning for the solution. It is hard >> to understand what you are looking for exactly and how to connect the dots. > > Let me

RE: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-05 Thread Long Li
>Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism > > >On 06/09/2019 03:22, Long Li wrote: >[ ... ] >> > >> Tracing shows that the CPU was in either hardirq or softirq all the >> time before warnings. During tests, the system was un

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-05 Thread Daniel Lezcano
On 06/09/2019 03:22, Long Li wrote: [ ... ] > > Tracing shows that the CPU was in either hardirq or softirq all the > time before warnings. During tests, the system was unresponsive at > times. > > Ming's patch fixed this problem. The system was responsive throughout > tests. > > As for

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-05 Thread Ming Lei
Hi Daniel, On Thu, Sep 05, 2019 at 12:37:13PM +0200, Daniel Lezcano wrote: > > Hi Ming, > > On 05/09/2019 11:06, Ming Lei wrote: > > On Wed, Sep 04, 2019 at 07:31:48PM +0200, Daniel Lezcano wrote: > >> Hi, > >> > >> On 04/09/2019 19:07, Bart Van Assche wrote: > >>> On 9/3/19 12:50 AM, Daniel

RE: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-05 Thread Long Li
>Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism > > >Hi Ming, > >On 05/09/2019 11:06, Ming Lei wrote: >> On Wed, Sep 04, 2019 at 07:31:48PM +0200, Daniel Lezcano wrote: >>> Hi, >>> >>> On 04/09/2019 19:07, Bart Van As

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-05 Thread Daniel Lezcano
Hi Ming, On 05/09/2019 11:06, Ming Lei wrote: > On Wed, Sep 04, 2019 at 07:31:48PM +0200, Daniel Lezcano wrote: >> Hi, >> >> On 04/09/2019 19:07, Bart Van Assche wrote: >>> On 9/3/19 12:50 AM, Daniel Lezcano wrote: On 03/09/2019 09:28, Ming Lei wrote: > On Tue, Sep 03, 2019 at

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-05 Thread Ming Lei
On Wed, Sep 04, 2019 at 12:47:13PM -0700, Bart Van Assche wrote: > On 9/4/19 11:02 AM, Peter Zijlstra wrote: > > On Wed, Sep 04, 2019 at 10:38:59AM -0700, Bart Van Assche wrote: > > > I think it is widely known that rdtsc is a relatively slow x86 > > > instruction. > > > So I expect that using

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-05 Thread Ming Lei
On Wed, Sep 04, 2019 at 07:31:48PM +0200, Daniel Lezcano wrote: > Hi, > > On 04/09/2019 19:07, Bart Van Assche wrote: > > On 9/3/19 12:50 AM, Daniel Lezcano wrote: > >> On 03/09/2019 09:28, Ming Lei wrote: > >>> On Tue, Sep 03, 2019 at 08:40:35AM +0200, Daniel Lezcano wrote: > It is a

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-04 Thread Bart Van Assche
On 9/4/19 11:02 AM, Peter Zijlstra wrote: On Wed, Sep 04, 2019 at 10:38:59AM -0700, Bart Van Assche wrote: I think it is widely known that rdtsc is a relatively slow x86 instruction. So I expect that using that instruction will cause a measurable overhead if it is called frequently enough. I'm

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-04 Thread Peter Zijlstra
On Wed, Sep 04, 2019 at 10:38:59AM -0700, Bart Van Assche wrote: > On 9/4/19 10:31 AM, Daniel Lezcano wrote: > > On 04/09/2019 19:07, Bart Van Assche wrote: > > > Only if CONFIG_IRQ_TIME_ACCOUNTING has been enabled. However, I don't > > > know any Linux distro that enables that option. That's

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-04 Thread Bart Van Assche
On 9/4/19 10:31 AM, Daniel Lezcano wrote: On 04/09/2019 19:07, Bart Van Assche wrote: Only if CONFIG_IRQ_TIME_ACCOUNTING has been enabled. However, I don't know any Linux distro that enables that option. That's probably because that option introduces two rdtsc() calls in each interrupt. Given

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-04 Thread Daniel Lezcano
Hi, On 04/09/2019 19:07, Bart Van Assche wrote: > On 9/3/19 12:50 AM, Daniel Lezcano wrote: >> On 03/09/2019 09:28, Ming Lei wrote: >>> On Tue, Sep 03, 2019 at 08:40:35AM +0200, Daniel Lezcano wrote: It is a scheduler problem then ? >>> >>> Scheduler can do nothing if the CPU is taken

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-04 Thread Bart Van Assche
On 9/3/19 12:50 AM, Daniel Lezcano wrote: On 03/09/2019 09:28, Ming Lei wrote: On Tue, Sep 03, 2019 at 08:40:35AM +0200, Daniel Lezcano wrote: It is a scheduler problem then ? Scheduler can do nothing if the CPU is taken completely by handling interrupt & softirq, so seems not a scheduler

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-03 Thread Ming Lei
On Tue, Sep 03, 2019 at 09:50:06AM +0200, Daniel Lezcano wrote: > On 03/09/2019 09:28, Ming Lei wrote: > > On Tue, Sep 03, 2019 at 08:40:35AM +0200, Daniel Lezcano wrote: > >> On 03/09/2019 08:31, Ming Lei wrote: > >>> Hi Daniel, > >>> > >>> On Tue, Sep 03, 2019 at 07:59:39AM +0200, Daniel Lezcano

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-03 Thread Ming Lei
On Tue, Sep 03, 2019 at 10:09:57AM +0200, Thomas Gleixner wrote: > On Tue, 3 Sep 2019, Ming Lei wrote: > > Scheduler can do nothing if the CPU is taken completely by handling > > interrupt & softirq, so seems not a scheduler problem, IMO. > > Well, but thinking more about it, the solution you are

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-03 Thread Thomas Gleixner
On Tue, 3 Sep 2019, Ming Lei wrote: > Scheduler can do nothing if the CPU is taken completely by handling > interrupt & softirq, so seems not a scheduler problem, IMO. Well, but thinking more about it, the solution you are proposing is more a bandaid than anything else. If you look at the

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-03 Thread Daniel Lezcano
On 03/09/2019 09:28, Ming Lei wrote: > On Tue, Sep 03, 2019 at 08:40:35AM +0200, Daniel Lezcano wrote: >> On 03/09/2019 08:31, Ming Lei wrote: >>> Hi Daniel, >>> >>> On Tue, Sep 03, 2019 at 07:59:39AM +0200, Daniel Lezcano wrote: Hi Ming Lei, On 03/09/2019 05:30, Ming Lei

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-03 Thread Ming Lei
On Tue, Sep 03, 2019 at 08:40:35AM +0200, Daniel Lezcano wrote: > On 03/09/2019 08:31, Ming Lei wrote: > > Hi Daniel, > > > > On Tue, Sep 03, 2019 at 07:59:39AM +0200, Daniel Lezcano wrote: > >> > >> Hi Ming Lei, > >> > >> On 03/09/2019 05:30, Ming Lei wrote: > >> > >> [ ... ] > >> > >> > >

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-03 Thread Daniel Lezcano
On 03/09/2019 08:31, Ming Lei wrote: > Hi Daniel, > > On Tue, Sep 03, 2019 at 07:59:39AM +0200, Daniel Lezcano wrote: >> >> Hi Ming Lei, >> >> On 03/09/2019 05:30, Ming Lei wrote: >> >> [ ... ] >> >> > 2) irq/timing doesn't cover softirq That's solvable, right? >>> >>> Yeah, we can

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-03 Thread Ming Lei
Hi Daniel, On Tue, Sep 03, 2019 at 07:59:39AM +0200, Daniel Lezcano wrote: > > Hi Ming Lei, > > On 03/09/2019 05:30, Ming Lei wrote: > > [ ... ] > > > >>> 2) irq/timing doesn't cover softirq > >> > >> That's solvable, right? > > > > Yeah, we can extend irq/timing, but ugly for irq/timing,

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-02 Thread Daniel Lezcano
Hi Ming Lei, On 03/09/2019 05:30, Ming Lei wrote: [ ... ] >>> 2) irq/timing doesn't cover softirq >> >> That's solvable, right? > > Yeah, we can extend irq/timing, but ugly for irq/timing, since irq/timing > focuses on hardirq predication, and softirq isn't involved in that > purpose. > >>

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-02 Thread Ming Lei
On Wed, Aug 28, 2019 at 04:07:19PM +0200, Thomas Gleixner wrote: > On Wed, 28 Aug 2019, Ming Lei wrote: > > On Wed, Aug 28, 2019 at 01:23:06PM +0200, Thomas Gleixner wrote: > > > On Wed, 28 Aug 2019, Ming Lei wrote: > > > > On Wed, Aug 28, 2019 at 01:09:44AM +0200, Thomas Gleixner wrote: > > > > >

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-29 Thread Ming Lei
On Thu, Aug 29, 2019 at 06:15:00AM +, Long Li wrote: > >>>For some high performance IO devices, interrupt may come very frequently, > >>>meantime IO request completion may take a bit time. Especially on some > >>>devices(SCSI or NVMe), IO requests can be submitted concurrently from >

RE: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-29 Thread Long Li
>>>For some high performance IO devices, interrupt may come very frequently, >>>meantime IO request completion may take a bit time. Especially on some >>>devices(SCSI or NVMe), IO requests can be submitted concurrently from >>>multiple CPU cores, however IO completion is only done on one of these

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-28 Thread Thomas Gleixner
On Wed, 28 Aug 2019, Ming Lei wrote: > On Wed, Aug 28, 2019 at 01:23:06PM +0200, Thomas Gleixner wrote: > > On Wed, 28 Aug 2019, Ming Lei wrote: > > > On Wed, Aug 28, 2019 at 01:09:44AM +0200, Thomas Gleixner wrote: > > > > > > Also how is that supposed to work when sched_clock is jiffies based? >

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-28 Thread Ming Lei
On Wed, Aug 28, 2019 at 01:23:06PM +0200, Thomas Gleixner wrote: > On Wed, 28 Aug 2019, Ming Lei wrote: > > On Wed, Aug 28, 2019 at 01:09:44AM +0200, Thomas Gleixner wrote: > > > > > Also how is that supposed to work when sched_clock is jiffies based? > > > > > > > > Good catch, looks

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-28 Thread Thomas Gleixner
On Wed, 28 Aug 2019, Ming Lei wrote: > On Wed, Aug 28, 2019 at 01:09:44AM +0200, Thomas Gleixner wrote: > > > > Also how is that supposed to work when sched_clock is jiffies based? > > > > > > Good catch, looks ktime_get_ns() is needed. > > > > And what is ktime_get_ns() returning when the only

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-28 Thread Ming Lei
On Wed, Aug 28, 2019 at 01:09:44AM +0200, Thomas Gleixner wrote: > On Wed, 28 Aug 2019, Ming Lei wrote: > > On Tue, Aug 27, 2019 at 04:42:02PM +0200, Thomas Gleixner wrote: > > > On Tue, 27 Aug 2019, Ming Lei wrote: > > > > + > > > > + int cpu = raw_smp_processor_id(); > > > > + struct

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-27 Thread Thomas Gleixner
On Wed, 28 Aug 2019, Ming Lei wrote: > On Tue, Aug 27, 2019 at 06:19:00PM +0200, Thomas Gleixner wrote: > > > We definitely are not going to have a 64bit multiplication and division on > > > every interrupt. Asided of that this breaks 32bit builds all over the > > > place. > > > > That said, we

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-27 Thread Thomas Gleixner
On Wed, 28 Aug 2019, Ming Lei wrote: > On Tue, Aug 27, 2019 at 04:42:02PM +0200, Thomas Gleixner wrote: > > On Tue, 27 Aug 2019, Ming Lei wrote: > > > + > > > + int cpu = raw_smp_processor_id(); > > > + struct irq_interval *inter = per_cpu_ptr(_irq_interval, cpu); > > > + u64 delta =

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-27 Thread Ming Lei
On Tue, Aug 27, 2019 at 06:19:00PM +0200, Thomas Gleixner wrote: > On Tue, 27 Aug 2019, Thomas Gleixner wrote: > > On Tue, 27 Aug 2019, Ming Lei wrote: > > > +/* > > > + * Update average irq interval with the Exponential Weighted Moving > > > + * Average(EWMA) > > > + */ > > > +static void

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-27 Thread Ming Lei
On Tue, Aug 27, 2019 at 04:42:02PM +0200, Thomas Gleixner wrote: > On Tue, 27 Aug 2019, Ming Lei wrote: > > +/* > > + * Update average irq interval with the Exponential Weighted Moving > > + * Average(EWMA) > > + */ > > +static void irq_update_interval(void) > > +{ > > +#define

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-27 Thread Thomas Gleixner
On Tue, 27 Aug 2019, Thomas Gleixner wrote: > On Tue, 27 Aug 2019, Ming Lei wrote: > > +/* > > + * Update average irq interval with the Exponential Weighted Moving > > + * Average(EWMA) > > + */ > > +static void irq_update_interval(void) > > +{ > > +#define IRQ_INTERVAL_EWMA_WEIGHT 128 > >

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-27 Thread Thomas Gleixner
On Tue, 27 Aug 2019, Ming Lei wrote: > +/* > + * Update average irq interval with the Exponential Weighted Moving > + * Average(EWMA) > + */ > +static void irq_update_interval(void) > +{ > +#define IRQ_INTERVAL_EWMA_WEIGHT 128 > +#define IRQ_INTERVAL_EWMA_PREV_FACTOR127 > +#define

[PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-27 Thread Ming Lei
For some high performance IO devices, interrupt may come very frequently, meantime IO request completion may take a bit time. Especially on some devices(SCSI or NVMe), IO requests can be submitted concurrently from multiple CPU cores, however IO completion is only done on one of these submission