Re: HRTimer causing rtctest to fail
Hi, Thomas Gleixnerwrites: > On Wed, 28 Feb 2018, Felipe Balbi wrote: >> Thomas Gleixner writes: >> > All numbers in micro seconds! What happens if you load the system or >> > restrict C-States? >> >> added intel_idle.max_cstate=0 to cmdline. Still see some failures, >> albeit a lot more rare, for example, caught one on iteration 8 of >> 8192Hz: >> >> 8192Hz: 1 2 3 4 5 6 7 >> PIE delta error: 0.000135 should be close to 0.000122 >> >> That is, however, only 13 uS late. Is that within the limits of "we >> don't quite care"? > > I think so. 13us (depending on the machine) can easily happen if multiple > interrupts/timers happen at the same time or if something keeps interrupts > disabled for a few microseconds which is certainly possible. > > What kind of system is this? 4 core Kabylake. >> BTW, it took some effort to get that one failure. I'll leave rtctest >> running in a long loop and try to figure out approximately how frequently >> it fails, but it seems like it's within the realm of "don't care", would >> you agree? > > Yes. cool, thanks -- balbi signature.asc Description: PGP signature
Re: HRTimer causing rtctest to fail
Hi, Thomas Gleixner writes: > On Wed, 28 Feb 2018, Felipe Balbi wrote: >> Thomas Gleixner writes: >> > All numbers in micro seconds! What happens if you load the system or >> > restrict C-States? >> >> added intel_idle.max_cstate=0 to cmdline. Still see some failures, >> albeit a lot more rare, for example, caught one on iteration 8 of >> 8192Hz: >> >> 8192Hz: 1 2 3 4 5 6 7 >> PIE delta error: 0.000135 should be close to 0.000122 >> >> That is, however, only 13 uS late. Is that within the limits of "we >> don't quite care"? > > I think so. 13us (depending on the machine) can easily happen if multiple > interrupts/timers happen at the same time or if something keeps interrupts > disabled for a few microseconds which is certainly possible. > > What kind of system is this? 4 core Kabylake. >> BTW, it took some effort to get that one failure. I'll leave rtctest >> running in a long loop and try to figure out approximately how frequently >> it fails, but it seems like it's within the realm of "don't care", would >> you agree? > > Yes. cool, thanks -- balbi signature.asc Description: PGP signature
Re: HRTimer causing rtctest to fail
On Wed, 28 Feb 2018, Felipe Balbi wrote: > Thomas Gleixnerwrites: > > All numbers in micro seconds! What happens if you load the system or > > restrict C-States? > > added intel_idle.max_cstate=0 to cmdline. Still see some failures, > albeit a lot more rare, for example, caught one on iteration 8 of > 8192Hz: > > 8192Hz:1 2 3 4 5 6 7 > PIE delta error: 0.000135 should be close to 0.000122 > > That is, however, only 13 uS late. Is that within the limits of "we > don't quite care"? I think so. 13us (depending on the machine) can easily happen if multiple interrupts/timers happen at the same time or if something keeps interrupts disabled for a few microseconds which is certainly possible. What kind of system is this? > BTW, it took some effort to get that one failure. I'll leave rtctest > running in a long loop and try to figure out approximately how frequently > it fails, but it seems like it's within the realm of "don't care", would > you agree? Yes. Thanks, tglx
Re: HRTimer causing rtctest to fail
On Wed, 28 Feb 2018, Felipe Balbi wrote: > Thomas Gleixner writes: > > All numbers in micro seconds! What happens if you load the system or > > restrict C-States? > > added intel_idle.max_cstate=0 to cmdline. Still see some failures, > albeit a lot more rare, for example, caught one on iteration 8 of > 8192Hz: > > 8192Hz:1 2 3 4 5 6 7 > PIE delta error: 0.000135 should be close to 0.000122 > > That is, however, only 13 uS late. Is that within the limits of "we > don't quite care"? I think so. 13us (depending on the machine) can easily happen if multiple interrupts/timers happen at the same time or if something keeps interrupts disabled for a few microseconds which is certainly possible. What kind of system is this? > BTW, it took some effort to get that one failure. I'll leave rtctest > running in a long loop and try to figure out approximately how frequently > it fails, but it seems like it's within the realm of "don't care", would > you agree? Yes. Thanks, tglx
Re: HRTimer causing rtctest to fail
Thomas Gleixnerwrites: > On Wed, 28 Feb 2018, Felipe Balbi wrote: >> Thomas Gleixner writes: >> > Enable the hrtimer and scheduling tracepoints. That should give you a hint >> > what's going on. >> >> Thanks, that does give me a lot more information. So here you go: >> >> rtctest-1348 [003] d..2 313.766141: hrtimer_start: >> hrtimer=667ce595 function=rtc_pie_update_irq expires=313573983010 >> softexpires=313573983010 mode=REL >> -0 [003] d.h1 313.767189: hrtimer_expire_entry: >> hrtimer=667ce595 function=rtc_pie_update_irq now=313574053764 >> >> We still have a 70754 nS deviation. After changing to absolute time, >> the deviation remains: >> >> -0 [000] dNh229.303251: hrtimer_start: >> hrtimer=6858b496 function=rtc_pie_update_irq expires=28765551360 >> softexpires=28765551360 mode=ABS >> -0 [000] d.h129.303565: hrtimer_expire_entry: >> hrtimer=6858b496 function=rtc_pie_update_irq now=28765621916 > > Changing REL/ABS in the kernel does not make a difference because periodic > mode just forwards by period so even if the first timer is started with REL > it results in a absolute timeline. What I meant is the user space > measurement as it cannot figure out when the first event was supposed to > happen so it's hard to calculate latency information. > > The interesting information is that the timer fires late and the system is > idle. Now the question is in which idle state did the machine go? > > Wake up from deeper C-states can be slow. On my laptop the wakeup latencies > are: > > POLL 0 > C1 2 > C1E 10 > C3 33 > C6 133 > C7S166 > C8 300 > C9 600 > C10 2600 > > All numbers in micro seconds! What happens if you load the system or > restrict C-States? added intel_idle.max_cstate=0 to cmdline. Still see some failures, albeit a lot more rare, for example, caught one on iteration 8 of 8192Hz: 8192Hz: 1 2 3 4 5 6 7 PIE delta error: 0.000135 should be close to 0.000122 That is, however, only 13 uS late. Is that within the limits of "we don't quite care"? BTW, it took some effort to get that one failure. I'll leave rtctest running in a long loop and try to figure out approximately how frequently it fails, but it seems like it's within the realm of "don't care", would you agree? cheers -- balbi signature.asc Description: PGP signature
Re: HRTimer causing rtctest to fail
Thomas Gleixner writes: > On Wed, 28 Feb 2018, Felipe Balbi wrote: >> Thomas Gleixner writes: >> > Enable the hrtimer and scheduling tracepoints. That should give you a hint >> > what's going on. >> >> Thanks, that does give me a lot more information. So here you go: >> >> rtctest-1348 [003] d..2 313.766141: hrtimer_start: >> hrtimer=667ce595 function=rtc_pie_update_irq expires=313573983010 >> softexpires=313573983010 mode=REL >> -0 [003] d.h1 313.767189: hrtimer_expire_entry: >> hrtimer=667ce595 function=rtc_pie_update_irq now=313574053764 >> >> We still have a 70754 nS deviation. After changing to absolute time, >> the deviation remains: >> >> -0 [000] dNh229.303251: hrtimer_start: >> hrtimer=6858b496 function=rtc_pie_update_irq expires=28765551360 >> softexpires=28765551360 mode=ABS >> -0 [000] d.h129.303565: hrtimer_expire_entry: >> hrtimer=6858b496 function=rtc_pie_update_irq now=28765621916 > > Changing REL/ABS in the kernel does not make a difference because periodic > mode just forwards by period so even if the first timer is started with REL > it results in a absolute timeline. What I meant is the user space > measurement as it cannot figure out when the first event was supposed to > happen so it's hard to calculate latency information. > > The interesting information is that the timer fires late and the system is > idle. Now the question is in which idle state did the machine go? > > Wake up from deeper C-states can be slow. On my laptop the wakeup latencies > are: > > POLL 0 > C1 2 > C1E 10 > C3 33 > C6 133 > C7S166 > C8 300 > C9 600 > C10 2600 > > All numbers in micro seconds! What happens if you load the system or > restrict C-States? added intel_idle.max_cstate=0 to cmdline. Still see some failures, albeit a lot more rare, for example, caught one on iteration 8 of 8192Hz: 8192Hz: 1 2 3 4 5 6 7 PIE delta error: 0.000135 should be close to 0.000122 That is, however, only 13 uS late. Is that within the limits of "we don't quite care"? BTW, it took some effort to get that one failure. I'll leave rtctest running in a long loop and try to figure out approximately how frequently it fails, but it seems like it's within the realm of "don't care", would you agree? cheers -- balbi signature.asc Description: PGP signature
Re: HRTimer causing rtctest to fail
On Wed, 28 Feb 2018, Felipe Balbi wrote: > Thomas Gleixnerwrites: > > Enable the hrtimer and scheduling tracepoints. That should give you a hint > > what's going on. > > Thanks, that does give me a lot more information. So here you go: > > rtctest-1348 [003] d..2 313.766141: hrtimer_start: > hrtimer=667ce595 function=rtc_pie_update_irq expires=313573983010 > softexpires=313573983010 mode=REL > -0 [003] d.h1 313.767189: hrtimer_expire_entry: > hrtimer=667ce595 function=rtc_pie_update_irq now=313574053764 > > We still have a 70754 nS deviation. After changing to absolute time, > the deviation remains: > > -0 [000] dNh229.303251: hrtimer_start: > hrtimer=6858b496 function=rtc_pie_update_irq expires=28765551360 > softexpires=28765551360 mode=ABS > -0 [000] d.h129.303565: hrtimer_expire_entry: > hrtimer=6858b496 function=rtc_pie_update_irq now=28765621916 Changing REL/ABS in the kernel does not make a difference because periodic mode just forwards by period so even if the first timer is started with REL it results in a absolute timeline. What I meant is the user space measurement as it cannot figure out when the first event was supposed to happen so it's hard to calculate latency information. The interesting information is that the timer fires late and the system is idle. Now the question is in which idle state did the machine go? Wake up from deeper C-states can be slow. On my laptop the wakeup latencies are: POLL 0 C1 2 C1E 10 C333 C6 133 C7S 166 C8 300 C9 600 C10 2600 All numbers in micro seconds! What happens if you load the system or restrict C-States? Thanks, tglx
Re: HRTimer causing rtctest to fail
On Wed, 28 Feb 2018, Felipe Balbi wrote: > Thomas Gleixner writes: > > Enable the hrtimer and scheduling tracepoints. That should give you a hint > > what's going on. > > Thanks, that does give me a lot more information. So here you go: > > rtctest-1348 [003] d..2 313.766141: hrtimer_start: > hrtimer=667ce595 function=rtc_pie_update_irq expires=313573983010 > softexpires=313573983010 mode=REL > -0 [003] d.h1 313.767189: hrtimer_expire_entry: > hrtimer=667ce595 function=rtc_pie_update_irq now=313574053764 > > We still have a 70754 nS deviation. After changing to absolute time, > the deviation remains: > > -0 [000] dNh229.303251: hrtimer_start: > hrtimer=6858b496 function=rtc_pie_update_irq expires=28765551360 > softexpires=28765551360 mode=ABS > -0 [000] d.h129.303565: hrtimer_expire_entry: > hrtimer=6858b496 function=rtc_pie_update_irq now=28765621916 Changing REL/ABS in the kernel does not make a difference because periodic mode just forwards by period so even if the first timer is started with REL it results in a absolute timeline. What I meant is the user space measurement as it cannot figure out when the first event was supposed to happen so it's hard to calculate latency information. The interesting information is that the timer fires late and the system is idle. Now the question is in which idle state did the machine go? Wake up from deeper C-states can be slow. On my laptop the wakeup latencies are: POLL 0 C1 2 C1E 10 C333 C6 133 C7S 166 C8 300 C9 600 C10 2600 All numbers in micro seconds! What happens if you load the system or restrict C-States? Thanks, tglx
Re: HRTimer causing rtctest to fail
Hi, Thomas Gleixnerwrites: > On Tue, 27 Feb 2018, Felipe Balbi wrote: >> I'm facing an odd problem with v4.16-rc2 (also reproducible on v4.15 >> final) where rtctest fails sometimes which PIE coming too late with >> frequencies >= 1024 Hz. >> >> I've modified rtctest.c a bit so that it continues running even after >> first failure just so I could get a glimpse of how many times it fails. >> >> Here are the results: >> >> Error 1024Hz i 17/20 diff 1089/976 >> >> Error 2048Hz i 2/20 diff 538/488 >> Error 2048Hz i 9/20 diff 558/488 >> Error 2048Hz i 16/20 diff 560/488 >> >> Error 4096Hz i 2/20 diff 305/244 >> Error 4096Hz i 7/20 diff 288/244 >> Error 4096Hz i 9/20 diff 285/244 >> Error 4096Hz i 11/20 diff 310/244 >> Error 4096Hz i 16/20 diff 284/244 >> Error 4096Hz i 20/20 diff 317/244 >> >> >> I added a few trace_printk() calls around rtc_dev_read() (and some other >> functions) and captured the time for entry and exit of that function. I >> noticed that HRTimer fires way too late and also way too early >> sometimes. On the order of 100+ us. > > Power management, deep idle states? > >> For example with iterations 17 of 1024 Hz (seen above) and 18 (also of >> 1024 Hz) I captured the following time stamps (all in uS): >> >> |Start | End | Expected End | Diff | >> |--+--+--+--| >> | 35900626 | 35901711 | 35901603 | -108 | (too late) >> | 35901826 | 35902628 | 35902803 | 174 | (too early) > > Too early? hrtimers never fire too early. The measurement in that test is > bogus: > > gettimeofday(, NULL); > /* This blocks */ > retval = read(fd, , sizeof(unsigned long)); > if (retval == -1) { > perror("read"); > exit(errno); > } > gettimeofday(, NULL); > > It's measuring relative time between going to sleep and wakeup, so if an > event comes late, then the next one will claim to be early which is > nonsense because periodic mode runs on an absolute time line. > > 1 2 3 4 5 6 7 > Expected: | | | | | | | > Real: | | | | | | | > > ^ late event > So 2->3 would give you (period + latency) and 3->4 (period - latency) > > And if the task gets scheduled out before the taking the start time stamp > then this is also giving bogus results. > > What the test really should figure out is the deviation of the absolute > time line. That's hard because the RTC code starts the hrtimer with: > > hrtimer_start(>pie_timer, period, HRTIMER_MODE_REL); > > which means relative to 'now'. So there is no good way for user space to > determine the expected timeline. > >> I can't come up with a proper explanation as to why HRTimer is firing at >> such wildly odd extremes. On most of the measurements, HRTimer executes >> within 5 uS of scheduled time and I can't convince myself that >> scheduling latency would get so high all of a sudden. Specially >> considering I'm testing with a 4 core machine and there's nothing else >> running on it. It's mostly idle. > > Enable the hrtimer and scheduling tracepoints. That should give you a hint > what's going on. Thanks, that does give me a lot more information. So here you go: rtctest-1348 [003] d..2 313.766141: hrtimer_start: hrtimer=667ce595 function=rtc_pie_update_irq expires=313573983010 softexpires=313573983010 mode=REL -0 [003] d.h1 313.767189: hrtimer_expire_entry: hrtimer=667ce595 function=rtc_pie_update_irq now=313574053764 We still have a 70754 nS deviation. After changing to absolute time, the deviation remains: -0 [000] dNh229.303251: hrtimer_start: hrtimer=6858b496 function=rtc_pie_update_irq expires=28765551360 softexpires=28765551360 mode=ABS -0 [000] d.h129.303565: hrtimer_expire_entry: hrtimer=6858b496 function=rtc_pie_update_irq now=28765621916 I took a few measurements (76 to be exact) and plotted the deviation from expected expiry time (now - expires from the tracepoint data) and it's looking rather odd. PNG attached (all numbers in nS). Could this be caused by FW stealing CPU behind Linux's back? I can take many more measurements, if necessary. Just let me know. cheers -- balbi signature.asc Description: PGP signature
Re: HRTimer causing rtctest to fail
Hi, Thomas Gleixner writes: > On Tue, 27 Feb 2018, Felipe Balbi wrote: >> I'm facing an odd problem with v4.16-rc2 (also reproducible on v4.15 >> final) where rtctest fails sometimes which PIE coming too late with >> frequencies >= 1024 Hz. >> >> I've modified rtctest.c a bit so that it continues running even after >> first failure just so I could get a glimpse of how many times it fails. >> >> Here are the results: >> >> Error 1024Hz i 17/20 diff 1089/976 >> >> Error 2048Hz i 2/20 diff 538/488 >> Error 2048Hz i 9/20 diff 558/488 >> Error 2048Hz i 16/20 diff 560/488 >> >> Error 4096Hz i 2/20 diff 305/244 >> Error 4096Hz i 7/20 diff 288/244 >> Error 4096Hz i 9/20 diff 285/244 >> Error 4096Hz i 11/20 diff 310/244 >> Error 4096Hz i 16/20 diff 284/244 >> Error 4096Hz i 20/20 diff 317/244 >> >> >> I added a few trace_printk() calls around rtc_dev_read() (and some other >> functions) and captured the time for entry and exit of that function. I >> noticed that HRTimer fires way too late and also way too early >> sometimes. On the order of 100+ us. > > Power management, deep idle states? > >> For example with iterations 17 of 1024 Hz (seen above) and 18 (also of >> 1024 Hz) I captured the following time stamps (all in uS): >> >> |Start | End | Expected End | Diff | >> |--+--+--+--| >> | 35900626 | 35901711 | 35901603 | -108 | (too late) >> | 35901826 | 35902628 | 35902803 | 174 | (too early) > > Too early? hrtimers never fire too early. The measurement in that test is > bogus: > > gettimeofday(, NULL); > /* This blocks */ > retval = read(fd, , sizeof(unsigned long)); > if (retval == -1) { > perror("read"); > exit(errno); > } > gettimeofday(, NULL); > > It's measuring relative time between going to sleep and wakeup, so if an > event comes late, then the next one will claim to be early which is > nonsense because periodic mode runs on an absolute time line. > > 1 2 3 4 5 6 7 > Expected: | | | | | | | > Real: | | | | | | | > > ^ late event > So 2->3 would give you (period + latency) and 3->4 (period - latency) > > And if the task gets scheduled out before the taking the start time stamp > then this is also giving bogus results. > > What the test really should figure out is the deviation of the absolute > time line. That's hard because the RTC code starts the hrtimer with: > > hrtimer_start(>pie_timer, period, HRTIMER_MODE_REL); > > which means relative to 'now'. So there is no good way for user space to > determine the expected timeline. > >> I can't come up with a proper explanation as to why HRTimer is firing at >> such wildly odd extremes. On most of the measurements, HRTimer executes >> within 5 uS of scheduled time and I can't convince myself that >> scheduling latency would get so high all of a sudden. Specially >> considering I'm testing with a 4 core machine and there's nothing else >> running on it. It's mostly idle. > > Enable the hrtimer and scheduling tracepoints. That should give you a hint > what's going on. Thanks, that does give me a lot more information. So here you go: rtctest-1348 [003] d..2 313.766141: hrtimer_start: hrtimer=667ce595 function=rtc_pie_update_irq expires=313573983010 softexpires=313573983010 mode=REL -0 [003] d.h1 313.767189: hrtimer_expire_entry: hrtimer=667ce595 function=rtc_pie_update_irq now=313574053764 We still have a 70754 nS deviation. After changing to absolute time, the deviation remains: -0 [000] dNh229.303251: hrtimer_start: hrtimer=6858b496 function=rtc_pie_update_irq expires=28765551360 softexpires=28765551360 mode=ABS -0 [000] d.h129.303565: hrtimer_expire_entry: hrtimer=6858b496 function=rtc_pie_update_irq now=28765621916 I took a few measurements (76 to be exact) and plotted the deviation from expected expiry time (now - expires from the tracepoint data) and it's looking rather odd. PNG attached (all numbers in nS). Could this be caused by FW stealing CPU behind Linux's back? I can take many more measurements, if necessary. Just let me know. cheers -- balbi signature.asc Description: PGP signature
Re: HRTimer causing rtctest to fail
On Tue, 27 Feb 2018, Felipe Balbi wrote: > I'm facing an odd problem with v4.16-rc2 (also reproducible on v4.15 > final) where rtctest fails sometimes which PIE coming too late with > frequencies >= 1024 Hz. > > I've modified rtctest.c a bit so that it continues running even after > first failure just so I could get a glimpse of how many times it fails. > > Here are the results: > > Error 1024Hz i 17/20 diff 1089/976 > > Error 2048Hz i 2/20 diff 538/488 > Error 2048Hz i 9/20 diff 558/488 > Error 2048Hz i 16/20 diff 560/488 > > Error 4096Hz i 2/20 diff 305/244 > Error 4096Hz i 7/20 diff 288/244 > Error 4096Hz i 9/20 diff 285/244 > Error 4096Hz i 11/20 diff 310/244 > Error 4096Hz i 16/20 diff 284/244 > Error 4096Hz i 20/20 diff 317/244 > > > I added a few trace_printk() calls around rtc_dev_read() (and some other > functions) and captured the time for entry and exit of that function. I > noticed that HRTimer fires way too late and also way too early > sometimes. On the order of 100+ us. Power management, deep idle states? > For example with iterations 17 of 1024 Hz (seen above) and 18 (also of > 1024 Hz) I captured the following time stamps (all in uS): > > |Start | End | Expected End | Diff | > |--+--+--+--| > | 35900626 | 35901711 | 35901603 | -108 | (too late) > | 35901826 | 35902628 | 35902803 | 174 | (too early) Too early? hrtimers never fire too early. The measurement in that test is bogus: gettimeofday(, NULL); /* This blocks */ retval = read(fd, , sizeof(unsigned long)); if (retval == -1) { perror("read"); exit(errno); } gettimeofday(, NULL); It's measuring relative time between going to sleep and wakeup, so if an event comes late, then the next one will claim to be early which is nonsense because periodic mode runs on an absolute time line. 1 2 3 4 5 6 7 Expected: | | | | | | | Real:| | | | | | | ^ late event So 2->3 would give you (period + latency) and 3->4 (period - latency) And if the task gets scheduled out before the taking the start time stamp then this is also giving bogus results. What the test really should figure out is the deviation of the absolute time line. That's hard because the RTC code starts the hrtimer with: hrtimer_start(>pie_timer, period, HRTIMER_MODE_REL); which means relative to 'now'. So there is no good way for user space to determine the expected timeline. > I can't come up with a proper explanation as to why HRTimer is firing at > such wildly odd extremes. On most of the measurements, HRTimer executes > within 5 uS of scheduled time and I can't convince myself that > scheduling latency would get so high all of a sudden. Specially > considering I'm testing with a 4 core machine and there's nothing else > running on it. It's mostly idle. Enable the hrtimer and scheduling tracepoints. That should give you a hint what's going on. Thanks, tglx
Re: HRTimer causing rtctest to fail
On Tue, 27 Feb 2018, Felipe Balbi wrote: > I'm facing an odd problem with v4.16-rc2 (also reproducible on v4.15 > final) where rtctest fails sometimes which PIE coming too late with > frequencies >= 1024 Hz. > > I've modified rtctest.c a bit so that it continues running even after > first failure just so I could get a glimpse of how many times it fails. > > Here are the results: > > Error 1024Hz i 17/20 diff 1089/976 > > Error 2048Hz i 2/20 diff 538/488 > Error 2048Hz i 9/20 diff 558/488 > Error 2048Hz i 16/20 diff 560/488 > > Error 4096Hz i 2/20 diff 305/244 > Error 4096Hz i 7/20 diff 288/244 > Error 4096Hz i 9/20 diff 285/244 > Error 4096Hz i 11/20 diff 310/244 > Error 4096Hz i 16/20 diff 284/244 > Error 4096Hz i 20/20 diff 317/244 > > > I added a few trace_printk() calls around rtc_dev_read() (and some other > functions) and captured the time for entry and exit of that function. I > noticed that HRTimer fires way too late and also way too early > sometimes. On the order of 100+ us. Power management, deep idle states? > For example with iterations 17 of 1024 Hz (seen above) and 18 (also of > 1024 Hz) I captured the following time stamps (all in uS): > > |Start | End | Expected End | Diff | > |--+--+--+--| > | 35900626 | 35901711 | 35901603 | -108 | (too late) > | 35901826 | 35902628 | 35902803 | 174 | (too early) Too early? hrtimers never fire too early. The measurement in that test is bogus: gettimeofday(, NULL); /* This blocks */ retval = read(fd, , sizeof(unsigned long)); if (retval == -1) { perror("read"); exit(errno); } gettimeofday(, NULL); It's measuring relative time between going to sleep and wakeup, so if an event comes late, then the next one will claim to be early which is nonsense because periodic mode runs on an absolute time line. 1 2 3 4 5 6 7 Expected: | | | | | | | Real:| | | | | | | ^ late event So 2->3 would give you (period + latency) and 3->4 (period - latency) And if the task gets scheduled out before the taking the start time stamp then this is also giving bogus results. What the test really should figure out is the deviation of the absolute time line. That's hard because the RTC code starts the hrtimer with: hrtimer_start(>pie_timer, period, HRTIMER_MODE_REL); which means relative to 'now'. So there is no good way for user space to determine the expected timeline. > I can't come up with a proper explanation as to why HRTimer is firing at > such wildly odd extremes. On most of the measurements, HRTimer executes > within 5 uS of scheduled time and I can't convince myself that > scheduling latency would get so high all of a sudden. Specially > considering I'm testing with a 4 core machine and there's nothing else > running on it. It's mostly idle. Enable the hrtimer and scheduling tracepoints. That should give you a hint what's going on. Thanks, tglx
HRTimer causing rtctest to fail
Hi folks, I'm facing an odd problem with v4.16-rc2 (also reproducible on v4.15 final) where rtctest fails sometimes which PIE coming too late with frequencies >= 1024 Hz. I've modified rtctest.c a bit so that it continues running even after first failure just so I could get a glimpse of how many times it fails. Here are the results: Error 1024Hz i 17/20 diff 1089/976 Error 2048Hz i 2/20 diff 538/488 Error 2048Hz i 9/20 diff 558/488 Error 2048Hz i 16/20 diff 560/488 Error 4096Hz i 2/20 diff 305/244 Error 4096Hz i 7/20 diff 288/244 Error 4096Hz i 9/20 diff 285/244 Error 4096Hz i 11/20 diff 310/244 Error 4096Hz i 16/20 diff 284/244 Error 4096Hz i 20/20 diff 317/244 I added a few trace_printk() calls around rtc_dev_read() (and some other functions) and captured the time for entry and exit of that function. I noticed that HRTimer fires way too late and also way too early sometimes. On the order of 100+ us. For example with iterations 17 of 1024 Hz (seen above) and 18 (also of 1024 Hz) I captured the following time stamps (all in uS): |Start | End | Expected End | Diff | |--+--+--+--| | 35900626 | 35901711 | 35901603 | -108 | (too late) | 35901826 | 35902628 | 35902803 | 174 | (too early) I can't come up with a proper explanation as to why HRTimer is firing at such wildly odd extremes. On most of the measurements, HRTimer executes within 5 uS of scheduled time and I can't convince myself that scheduling latency would get so high all of a sudden. Specially considering I'm testing with a 4 core machine and there's nothing else running on it. It's mostly idle. Any hints as to how to track down this problem further? Also, do folks agree that rtctest's 10% slack should also be considered for too early firing times? I mean: diff --git a/tools/testing/selftests/timers/rtctest.c b/tools/testing/selftests/timers/rtctest.c index 411eff625e66..99aa46fc556c 100644 --- a/tools/testing/selftests/timers/rtctest.c +++ b/tools/testing/selftests/timers/rtctest.c @@ -317,7 +317,8 @@ int main(int argc, char **argv) gettimeofday(, NULL); timersub(, , ); if (diff.tv_sec > 0 || - diff.tv_usec > ((100L / tmp) * 1.10)) { + diff.tv_usec > ((100L / tmp) * 1.10) || + diff.tv_usec < ((100L / tmp) * 0.9)) { fprintf(stderr, "\nPIE delta error: %ld.%06ld should be close to 0.%06ld\n", diff.tv_sec, diff.tv_usec, (100L / tmp)); -- balbi signature.asc Description: PGP signature
HRTimer causing rtctest to fail
Hi folks, I'm facing an odd problem with v4.16-rc2 (also reproducible on v4.15 final) where rtctest fails sometimes which PIE coming too late with frequencies >= 1024 Hz. I've modified rtctest.c a bit so that it continues running even after first failure just so I could get a glimpse of how many times it fails. Here are the results: Error 1024Hz i 17/20 diff 1089/976 Error 2048Hz i 2/20 diff 538/488 Error 2048Hz i 9/20 diff 558/488 Error 2048Hz i 16/20 diff 560/488 Error 4096Hz i 2/20 diff 305/244 Error 4096Hz i 7/20 diff 288/244 Error 4096Hz i 9/20 diff 285/244 Error 4096Hz i 11/20 diff 310/244 Error 4096Hz i 16/20 diff 284/244 Error 4096Hz i 20/20 diff 317/244 I added a few trace_printk() calls around rtc_dev_read() (and some other functions) and captured the time for entry and exit of that function. I noticed that HRTimer fires way too late and also way too early sometimes. On the order of 100+ us. For example with iterations 17 of 1024 Hz (seen above) and 18 (also of 1024 Hz) I captured the following time stamps (all in uS): |Start | End | Expected End | Diff | |--+--+--+--| | 35900626 | 35901711 | 35901603 | -108 | (too late) | 35901826 | 35902628 | 35902803 | 174 | (too early) I can't come up with a proper explanation as to why HRTimer is firing at such wildly odd extremes. On most of the measurements, HRTimer executes within 5 uS of scheduled time and I can't convince myself that scheduling latency would get so high all of a sudden. Specially considering I'm testing with a 4 core machine and there's nothing else running on it. It's mostly idle. Any hints as to how to track down this problem further? Also, do folks agree that rtctest's 10% slack should also be considered for too early firing times? I mean: diff --git a/tools/testing/selftests/timers/rtctest.c b/tools/testing/selftests/timers/rtctest.c index 411eff625e66..99aa46fc556c 100644 --- a/tools/testing/selftests/timers/rtctest.c +++ b/tools/testing/selftests/timers/rtctest.c @@ -317,7 +317,8 @@ int main(int argc, char **argv) gettimeofday(, NULL); timersub(, , ); if (diff.tv_sec > 0 || - diff.tv_usec > ((100L / tmp) * 1.10)) { + diff.tv_usec > ((100L / tmp) * 1.10) || + diff.tv_usec < ((100L / tmp) * 0.9)) { fprintf(stderr, "\nPIE delta error: %ld.%06ld should be close to 0.%06ld\n", diff.tv_sec, diff.tv_usec, (100L / tmp)); -- balbi signature.asc Description: PGP signature