Re: [PATCH] perf: fix interrupt handler timing harness
On Mon, Jul 8, 2013 at 10:54 PM, Andi Kleen wrote: >> I did a quite a bit of ftracing to look for spots inside the handler >> which were taking large amounts of time. There were none. The >> execution time was spread very evenly over the entire nmi handler. It >> didn't appear to be any individual hot cachelines or doing something >> silly like sitting in a loop handling lots of PMU events. > > In some cases callgraphs seemed to be quite expensive (large fraction > of the handler time) > In my case, I am not doing callgraphs, just regular cycles. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf: fix interrupt handler timing harness
> I did a quite a bit of ftracing to look for spots inside the handler > which were taking large amounts of time. There were none. The > execution time was spread very evenly over the entire nmi handler. It > didn't appear to be any individual hot cachelines or doing something > silly like sitting in a loop handling lots of PMU events. In some cases callgraphs seemed to be quite expensive (large fraction of the handler time) -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf: fix interrupt handler timing harness
On 07/08/2013 01:20 PM, Stephane Eranian wrote: > On Mon, Jul 8, 2013 at 10:05 PM, Dave Hansen wrote: >> If the interrupts _consistently_ take too long individually they can >> starve out all the other CPU users. I saw no way to make them finish >> faster, so the only recourse is to also drop the rate. >> > I think we need to investigate why some interrupts take so much time. > Could be HW, could be SW. Not talking about old hardware here. > Once we understand this, then we know maybe adjust the timing on > our patch. I spent quite a while looking at it on my hardware. It's difficult to profile in NMIs, but I'm fairly satisfied (for me) it is a NUMA issue which gets worse as I add cores. I did a quite a bit of ftracing to look for spots inside the handler which were taking large amounts of time. There were none. The execution time was spread very evenly over the entire nmi handler. It didn't appear to be any individual hot cachelines or doing something silly like sitting in a loop handling lots of PMU events. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf: fix interrupt handler timing harness
On Mon, Jul 8, 2013 at 10:05 PM, Dave Hansen wrote: > On 07/08/2013 11:08 AM, Stephane Eranian wrote: >> I admit I have some issues with your patch and what it is trying to avoid. >> There is already interrupt throttling. Your code seems to address latency >> issues on the handler rather than rate issues. Yet to mitigate the latency >> it is modify the throttling. > > If we have too many interrupts, we need to drop the rate (existing > throttling). > > If the interrupts _consistently_ take too long individually they can > starve out all the other CPU users. I saw no way to make them finish > faster, so the only recourse is to also drop the rate. > I think we need to investigate why some interrupts take so much time. Could be HW, could be SW. Not talking about old hardware here. Once we understand this, then we know maybe adjust the timing on our patch. >> For some unknown reasons, my HSW interrupt handler goes crazy for >> a while running a very simple: >>$ perf record -e cycles branchy_loop >> >> And I do see in the log: >> perf samples too long (2546 > 2500), lowering >> kernel.perf_event_max_sample_rate to 5 >> >> Which is an enormous latency. I instrumented the code, and under >> normal conditions the latency >> of the handler for this perf run, is about 500ns and it is consistent >> with what I see on SNB. > > I was seeing latencies near 1 second from time to time, but > _consistently_ in the hundreds of milliseconds. On my systems, I see 500ns with one session running. But on HSW, something else is going on with bursts at 2500ns. That's not normal. I want an explanation for this. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf: fix interrupt handler timing harness
On 07/08/2013 11:08 AM, Stephane Eranian wrote: > I admit I have some issues with your patch and what it is trying to avoid. > There is already interrupt throttling. Your code seems to address latency > issues on the handler rather than rate issues. Yet to mitigate the latency > it is modify the throttling. If we have too many interrupts, we need to drop the rate (existing throttling). If the interrupts _consistently_ take too long individually they can starve out all the other CPU users. I saw no way to make them finish faster, so the only recourse is to also drop the rate. > For some unknown reasons, my HSW interrupt handler goes crazy for > a while running a very simple: >$ perf record -e cycles branchy_loop > > And I do see in the log: > perf samples too long (2546 > 2500), lowering > kernel.perf_event_max_sample_rate to 5 > > Which is an enormous latency. I instrumented the code, and under > normal conditions the latency > of the handler for this perf run, is about 500ns and it is consistent > with what I see on SNB. I was seeing latencies near 1 second from time to time, but _consistently_ in the hundreds of milliseconds. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf: fix interrupt handler timing harness
On Mon, Jul 8, 2013 at 4:34 PM, Dave Hansen wrote: > On 07/04/2013 03:30 PM, Stephane Eranian wrote: >> There was an misunderstanding on the API of the do_div() >> macro. It returns the remainder of the division and this >> was not what the function expected leading to disabling the >> interrupt latency watchdog. > > "Misunderstanding" is a very kind term for what I did there. Stephane, > were you actually running in to the new cpu limit, or were you just > trying to update kernel.perf_event_max_sample_rate? > I am chasing a problem on HSW desktop where your code triggers the throttling. It happens systematically on my system first time I run perf record usually the first time. I admit I have some issues with your patch and what it is trying to avoid. There is already interrupt throttling. Your code seems to address latency issues on the handler rather than rate issues. Yet to mitigate the latency it is modify the throttling. For some unknown reasons, my HSW interrupt handler goes crazy for a while running a very simple: $ perf record -e cycles branchy_loop And I do see in the log: perf samples too long (2546 > 2500), lowering kernel.perf_event_max_sample_rate to 5 Which is an enormous latency. I instrumented the code, and under normal conditions the latency of the handler for this perf run, is about 500ns and it is consistent with what I see on SNB. However, on HSW, it is sometimes 5x longer. I have no explanation for this. Andi's patch to delay the NMI ack is enabled but it does not alleviate this problem. There is something else going on possibly with the HW, and I don't know what it is. This is not a one off. It lasts for several calls because it fires your watchdog which averages out over multiple calls. So one cannot blame this on the case where the samping buffer gets full, for instance. My model is: processor : 6 vendor_id : GenuineIntel cpu family : 6 model : 60 model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz stepping : 3 microcode : 0x7 So something is not right and I would like to know what it is. I will keep investigating. > BTW, I also did the same thing in 2ab00456: > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2ab00456ea8a0d79acb1390659b98416111880b2 > > I'll have a patch out shortly for that one, but its damage would be > limited to a bogus printk. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf: fix interrupt handler timing harness
On 07/04/2013 03:30 PM, Stephane Eranian wrote: > There was an misunderstanding on the API of the do_div() > macro. It returns the remainder of the division and this > was not what the function expected leading to disabling the > interrupt latency watchdog. "Misunderstanding" is a very kind term for what I did there. Stephane, were you actually running in to the new cpu limit, or were you just trying to update kernel.perf_event_max_sample_rate? BTW, I also did the same thing in 2ab00456: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2ab00456ea8a0d79acb1390659b98416111880b2 I'll have a patch out shortly for that one, but its damage would be limited to a bogus printk. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf: fix interrupt handler timing harness
On 07/04/2013 03:30 PM, Stephane Eranian wrote: There was an misunderstanding on the API of the do_div() macro. It returns the remainder of the division and this was not what the function expected leading to disabling the interrupt latency watchdog. Misunderstanding is a very kind term for what I did there. Stephane, were you actually running in to the new cpu limit, or were you just trying to update kernel.perf_event_max_sample_rate? BTW, I also did the same thing in 2ab00456: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2ab00456ea8a0d79acb1390659b98416111880b2 I'll have a patch out shortly for that one, but its damage would be limited to a bogus printk. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf: fix interrupt handler timing harness
On Mon, Jul 8, 2013 at 4:34 PM, Dave Hansen dave.han...@intel.com wrote: On 07/04/2013 03:30 PM, Stephane Eranian wrote: There was an misunderstanding on the API of the do_div() macro. It returns the remainder of the division and this was not what the function expected leading to disabling the interrupt latency watchdog. Misunderstanding is a very kind term for what I did there. Stephane, were you actually running in to the new cpu limit, or were you just trying to update kernel.perf_event_max_sample_rate? I am chasing a problem on HSW desktop where your code triggers the throttling. It happens systematically on my system first time I run perf record usually the first time. I admit I have some issues with your patch and what it is trying to avoid. There is already interrupt throttling. Your code seems to address latency issues on the handler rather than rate issues. Yet to mitigate the latency it is modify the throttling. For some unknown reasons, my HSW interrupt handler goes crazy for a while running a very simple: $ perf record -e cycles branchy_loop And I do see in the log: perf samples too long (2546 2500), lowering kernel.perf_event_max_sample_rate to 5 Which is an enormous latency. I instrumented the code, and under normal conditions the latency of the handler for this perf run, is about 500ns and it is consistent with what I see on SNB. However, on HSW, it is sometimes 5x longer. I have no explanation for this. Andi's patch to delay the NMI ack is enabled but it does not alleviate this problem. There is something else going on possibly with the HW, and I don't know what it is. This is not a one off. It lasts for several calls because it fires your watchdog which averages out over multiple calls. So one cannot blame this on the case where the samping buffer gets full, for instance. My model is: processor : 6 vendor_id : GenuineIntel cpu family : 6 model : 60 model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz stepping : 3 microcode : 0x7 So something is not right and I would like to know what it is. I will keep investigating. BTW, I also did the same thing in 2ab00456: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2ab00456ea8a0d79acb1390659b98416111880b2 I'll have a patch out shortly for that one, but its damage would be limited to a bogus printk. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf: fix interrupt handler timing harness
On 07/08/2013 11:08 AM, Stephane Eranian wrote: I admit I have some issues with your patch and what it is trying to avoid. There is already interrupt throttling. Your code seems to address latency issues on the handler rather than rate issues. Yet to mitigate the latency it is modify the throttling. If we have too many interrupts, we need to drop the rate (existing throttling). If the interrupts _consistently_ take too long individually they can starve out all the other CPU users. I saw no way to make them finish faster, so the only recourse is to also drop the rate. For some unknown reasons, my HSW interrupt handler goes crazy for a while running a very simple: $ perf record -e cycles branchy_loop And I do see in the log: perf samples too long (2546 2500), lowering kernel.perf_event_max_sample_rate to 5 Which is an enormous latency. I instrumented the code, and under normal conditions the latency of the handler for this perf run, is about 500ns and it is consistent with what I see on SNB. I was seeing latencies near 1 second from time to time, but _consistently_ in the hundreds of milliseconds. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf: fix interrupt handler timing harness
On Mon, Jul 8, 2013 at 10:05 PM, Dave Hansen dave.han...@intel.com wrote: On 07/08/2013 11:08 AM, Stephane Eranian wrote: I admit I have some issues with your patch and what it is trying to avoid. There is already interrupt throttling. Your code seems to address latency issues on the handler rather than rate issues. Yet to mitigate the latency it is modify the throttling. If we have too many interrupts, we need to drop the rate (existing throttling). If the interrupts _consistently_ take too long individually they can starve out all the other CPU users. I saw no way to make them finish faster, so the only recourse is to also drop the rate. I think we need to investigate why some interrupts take so much time. Could be HW, could be SW. Not talking about old hardware here. Once we understand this, then we know maybe adjust the timing on our patch. For some unknown reasons, my HSW interrupt handler goes crazy for a while running a very simple: $ perf record -e cycles branchy_loop And I do see in the log: perf samples too long (2546 2500), lowering kernel.perf_event_max_sample_rate to 5 Which is an enormous latency. I instrumented the code, and under normal conditions the latency of the handler for this perf run, is about 500ns and it is consistent with what I see on SNB. I was seeing latencies near 1 second from time to time, but _consistently_ in the hundreds of milliseconds. On my systems, I see 500ns with one session running. But on HSW, something else is going on with bursts at 2500ns. That's not normal. I want an explanation for this. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf: fix interrupt handler timing harness
On 07/08/2013 01:20 PM, Stephane Eranian wrote: On Mon, Jul 8, 2013 at 10:05 PM, Dave Hansen dave.han...@intel.com wrote: If the interrupts _consistently_ take too long individually they can starve out all the other CPU users. I saw no way to make them finish faster, so the only recourse is to also drop the rate. I think we need to investigate why some interrupts take so much time. Could be HW, could be SW. Not talking about old hardware here. Once we understand this, then we know maybe adjust the timing on our patch. I spent quite a while looking at it on my hardware. It's difficult to profile in NMIs, but I'm fairly satisfied (for me) it is a NUMA issue which gets worse as I add cores. I did a quite a bit of ftracing to look for spots inside the handler which were taking large amounts of time. There were none. The execution time was spread very evenly over the entire nmi handler. It didn't appear to be any individual hot cachelines or doing something silly like sitting in a loop handling lots of PMU events. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf: fix interrupt handler timing harness
I did a quite a bit of ftracing to look for spots inside the handler which were taking large amounts of time. There were none. The execution time was spread very evenly over the entire nmi handler. It didn't appear to be any individual hot cachelines or doing something silly like sitting in a loop handling lots of PMU events. In some cases callgraphs seemed to be quite expensive (large fraction of the handler time) -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf: fix interrupt handler timing harness
On Mon, Jul 8, 2013 at 10:54 PM, Andi Kleen a...@linux.jf.intel.com wrote: I did a quite a bit of ftracing to look for spots inside the handler which were taking large amounts of time. There were none. The execution time was spread very evenly over the entire nmi handler. It didn't appear to be any individual hot cachelines or doing something silly like sitting in a loop handling lots of PMU events. In some cases callgraphs seemed to be quite expensive (large fraction of the handler time) In my case, I am not doing callgraphs, just regular cycles. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf: fix interrupt handler timing harness
* Stephane Eranian wrote: > This patch fixes a serious bug in: > > commit 14c63f17b1fde5a575a28e96547a22b451c71fb5 > Author: Dave Hansen > Date: Fri Jun 21 08:51:36 2013 -0700 > > perf: Drop sample rate when sampling is too slow > > > There was an misunderstanding on the API of the do_div() > macro. It returns the remainder of the division and this > was not what the function expected leading to disabling the > interrupt latency watchdog. > > This patch also remove a duplicate assignment in > perf_sample_event_took(). > > Signed-off-by: Stephane Eranian Thanks Stephane - I've applied this to perf/urgent and will get it to Linus ASAP. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf: fix interrupt handler timing harness
* Stephane Eranian eran...@google.com wrote: This patch fixes a serious bug in: commit 14c63f17b1fde5a575a28e96547a22b451c71fb5 Author: Dave Hansen dave.han...@linux.intel.com Date: Fri Jun 21 08:51:36 2013 -0700 perf: Drop sample rate when sampling is too slow There was an misunderstanding on the API of the do_div() macro. It returns the remainder of the division and this was not what the function expected leading to disabling the interrupt latency watchdog. This patch also remove a duplicate assignment in perf_sample_event_took(). Signed-off-by: Stephane Eranian eran...@google.com Thanks Stephane - I've applied this to perf/urgent and will get it to Linus ASAP. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] perf: fix interrupt handler timing harness
This patch fixes a serious bug in: commit 14c63f17b1fde5a575a28e96547a22b451c71fb5 Author: Dave Hansen Date: Fri Jun 21 08:51:36 2013 -0700 perf: Drop sample rate when sampling is too slow There was an misunderstanding on the API of the do_div() macro. It returns the remainder of the division and this was not what the function expected leading to disabling the interrupt latency watchdog. This patch also remove a duplicate assignment in perf_sample_event_took(). Signed-off-by: Stephane Eranian --- diff --git a/kernel/events/core.c b/kernel/events/core.c index 1db3af9..1833bc5 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -182,7 +182,7 @@ void update_perf_cpu_limits(void) u64 tmp = perf_sample_period_ns; tmp *= sysctl_perf_cpu_time_max_percent; - tmp = do_div(tmp, 100); + do_div(tmp, 100); atomic_set(_sample_allowed_ns, tmp); } @@ -232,7 +232,7 @@ DEFINE_PER_CPU(u64, running_sample_length); void perf_sample_event_took(u64 sample_len_ns) { u64 avg_local_sample_len; - u64 local_samples_len = __get_cpu_var(running_sample_length); + u64 local_samples_len; if (atomic_read(_sample_allowed_ns) == 0) return; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] perf: fix interrupt handler timing harness
This patch fixes a serious bug in: commit 14c63f17b1fde5a575a28e96547a22b451c71fb5 Author: Dave Hansen dave.han...@linux.intel.com Date: Fri Jun 21 08:51:36 2013 -0700 perf: Drop sample rate when sampling is too slow There was an misunderstanding on the API of the do_div() macro. It returns the remainder of the division and this was not what the function expected leading to disabling the interrupt latency watchdog. This patch also remove a duplicate assignment in perf_sample_event_took(). Signed-off-by: Stephane Eranian eran...@google.com --- diff --git a/kernel/events/core.c b/kernel/events/core.c index 1db3af9..1833bc5 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -182,7 +182,7 @@ void update_perf_cpu_limits(void) u64 tmp = perf_sample_period_ns; tmp *= sysctl_perf_cpu_time_max_percent; - tmp = do_div(tmp, 100); + do_div(tmp, 100); atomic_set(perf_sample_allowed_ns, tmp); } @@ -232,7 +232,7 @@ DEFINE_PER_CPU(u64, running_sample_length); void perf_sample_event_took(u64 sample_len_ns) { u64 avg_local_sample_len; - u64 local_samples_len = __get_cpu_var(running_sample_length); + u64 local_samples_len; if (atomic_read(perf_sample_allowed_ns) == 0) return; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/