Re: [PATCH] futex: Handle transient "ownerless" rtmutex state correctly

2020-11-04 Thread Gratian Crisan
the hell out of that tglx dude who had to > page in all the futex horrors again. Condensed version is above. > > [ tglx: Wrote comment and changelog ] > > Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex") > Reported-by: Gratian Crisan > Signed-

Re: BUG_ON(!newowner) in fixup_pi_state_owner()

2020-11-03 Thread Gratian Crisan
Hi all, I apologize for waking up the futex demons (and replying to my own email), but ... Gratian Crisan writes: > > Brandon and I have been debugging a nasty race that leads to > BUG_ON(!newowner) in fixup_pi_state_owner() in futex.c. So far > we've only been able to reproduce the

BUG_ON(!newowner) in fixup_pi_state_owner()

2020-10-28 Thread Gratian Crisan
Hi all, Brandon and I have been debugging a nasty race that leads to BUG_ON(!newowner) in fixup_pi_state_owner() in futex.c. So far we've only been able to reproduce the issue on 4.9.y-rt kernels. We are still testing if this is a problem for later RT branches. The original reproducing app was

Re: [patch 1/1] Kconfig: Introduce CONFIG_PREEMPT_RT

2019-07-16 Thread Gratian Crisan
choice is renamed to PREEMPT_LL which select PREEMPT as > well. > > No functional change. > > Signed-off-by: Thomas Gleixner +1 from National Instruments. We have a vested interest in preempt_rt and we're committed in helping support, maintain, and test it. Glad to see this hap

Kernel page fault in vmalloc_fault() after a preempted ioremap

2018-03-08 Thread Gratian Crisan
Hi all, We are seeing kernel page faults happening on module loads with certain drivers like the i915 video driver[1]. This was initially discovered on a 4.9 PREEMPT_RT kernel. It takes 5 days on average to reproduce using a simple reboot loop test. Looking at the code paths involved I believe

Kernel page fault in vmalloc_fault() after a preempted ioremap

2018-03-08 Thread Gratian Crisan
Hi all, We are seeing kernel page faults happening on module loads with certain drivers like the i915 video driver[1]. This was initially discovered on a 4.9 PREEMPT_RT kernel. It takes 5 days on average to reproduce using a simple reboot loop test. Looking at the code paths involved I believe

Re: [PATCH] futex: Avoid violating the 10th rule of futex

2017-12-08 Thread Gratian Crisan
Peter Zijlstra writes: > On Thu, Dec 07, 2017 at 05:02:40PM -0600, Gratian Crisan wrote: > >> Yep ... looks good to me. I've been running two targets with the >> original reproducer for 8 hours now plus a target running the C test. >> All of them are still going. >>

Re: [PATCH] futex: Avoid violating the 10th rule of futex

2017-12-08 Thread Gratian Crisan
Peter Zijlstra writes: > On Thu, Dec 07, 2017 at 05:02:40PM -0600, Gratian Crisan wrote: > >> Yep ... looks good to me. I've been running two targets with the >> original reproducer for 8 hours now plus a target running the C test. >> All of them are still going. >>

Re: PI futexes + lock stealing woes

2017-12-07 Thread Gratian Crisan
Julia Cartwright writes: > On Thu, Dec 07, 2017 at 08:57:59AM -0600, Gratian Crisan wrote: >> >> Peter Zijlstra writes: >> > The below compiles and boots, but is otherwise untested. Could you give >> > it a spin? >> >> Thank you! Yes, I'll start a t

Re: PI futexes + lock stealing woes

2017-12-07 Thread Gratian Crisan
Julia Cartwright writes: > On Thu, Dec 07, 2017 at 08:57:59AM -0600, Gratian Crisan wrote: >> >> Peter Zijlstra writes: >> > The below compiles and boots, but is otherwise untested. Could you give >> > it a spin? >> >> Thank you! Yes, I'll start a t

Re: PI futexes + lock stealing woes

2017-12-07 Thread Gratian Crisan
Peter Zijlstra writes: > The below compiles and boots, but is otherwise untested. Could you give > it a spin? Thank you! Yes, I'll start a test now. -Gratian > --- > kernel/futex.c | 83 > + > kernel/locking/rtmutex.c| 26

Re: PI futexes + lock stealing woes

2017-12-07 Thread Gratian Crisan
Peter Zijlstra writes: > The below compiles and boots, but is otherwise untested. Could you give > it a spin? Thank you! Yes, I'll start a test now. -Gratian > --- > kernel/futex.c | 83 > + > kernel/locking/rtmutex.c| 26

Re: PI futexes + lock stealing woes

2017-12-06 Thread Gratian Crisan
Peter Zijlstra writes: > On Wed, Nov 29, 2017 at 11:56:05AM -0600, Julia Cartwright wrote: > >> fixup_owner() used to have additional seemingly relevant checks in place >> that were removed 73d786bd043eb ("futex: Rework inconsistent >> rt_mutex/futex_q state"). > > *groan*... yes. I completely

Re: PI futexes + lock stealing woes

2017-12-06 Thread Gratian Crisan
Peter Zijlstra writes: > On Wed, Nov 29, 2017 at 11:56:05AM -0600, Julia Cartwright wrote: > >> fixup_owner() used to have additional seemingly relevant checks in place >> that were removed 73d786bd043eb ("futex: Rework inconsistent >> rt_mutex/futex_q state"). > > *groan*... yes. I completely

Re: [RT-SUMMIT] Prague Oct. 21st

2017-10-18 Thread Gratian Crisan
Hi Thomas, Is there additional info on the address of the venue. The wiki links to the Czech Technical University web page, however from what I can tell there are multiple faculty buildings in that area of Prague. I apologize if I missed something obvious, I'm currently working with crappy

Re: [RT-SUMMIT] Prague Oct. 21st

2017-10-18 Thread Gratian Crisan
Hi Thomas, Is there additional info on the address of the venue. The wiki links to the Czech Technical University web page, however from what I can tell there are multiple faculty buildings in that area of Prague. I apologize if I missed something obvious, I'm currently working with crappy

Re: [PATCH RT] time/hrtimer: Use softirq based wakeups for non-RT threads

2017-10-05 Thread Gratian Crisan
caller has a RT priority. > > Reported-by: Gratian Crisan <gratian.cri...@ni.com> I can confirm this patch fixes the original problem reported. I ran an overnight test (about 30 hours total) on two platforms using cyclictest + hrtimer stress load: configurable number of SCHED_OTHER threads d

Re: [PATCH RT] time/hrtimer: Use softirq based wakeups for non-RT threads

2017-10-05 Thread Gratian Crisan
caller has a RT priority. > > Reported-by: Gratian Crisan I can confirm this patch fixes the original problem reported. I ran an overnight test (about 30 hours total) on two platforms using cyclictest + hrtimer stress load: configurable number of SCHED_OTHER threads doing random clock_nano

latency induced by hrtimer stacking

2017-01-04 Thread Gratian Crisan
); return EXIT_FAILURE; } } while (!g_stop) clock_nanosleep(CLOCK, 0, , NULL); for (i = 0; i < g_nthreads; i++) pthread_join(threads[i], NULL); return EXIT_SUCCESS; } --- [2] --- >From 6e6e7c852fc8efc95d

latency induced by hrtimer stacking

2017-01-04 Thread Gratian Crisan
); return EXIT_FAILURE; } } while (!g_stop) clock_nanosleep(CLOCK, 0, , NULL); for (i = 0; i < g_nthreads; i++) pthread_join(threads[i], NULL); return EXIT_SUCCESS; } --- [2] --- >From 6e6e7c852fc8efc95db

Re: [PATCH RFC] clocksource: Detect a watchdog overflow

2016-04-07 Thread Gratian Crisan
John Stultz writes: > On Tue, Mar 15, 2016 at 11:50 AM, Gratian Crisan <gratian.cri...@ni.com> > wrote: >> The clocksource watchdog can falsely trigger and disable the main >> clocksource when the watchdog wraps around. >> >> The reason is that an interrupt

Re: [PATCH RFC] clocksource: Detect a watchdog overflow

2016-04-07 Thread Gratian Crisan
John Stultz writes: > On Tue, Mar 15, 2016 at 11:50 AM, Gratian Crisan > wrote: >> The clocksource watchdog can falsely trigger and disable the main >> clocksource when the watchdog wraps around. >> >> The reason is that an interrupt storm and/or high priority

[PATCH RFC] clocksource: Detect a watchdog overflow

2016-03-15 Thread Gratian Crisan
can represent without overflow and do not disqualify the main clocksource if the delta since the last time we have checked exceeds the measurement capabilities of the watchdog clocksource. Signed-off-by: Gratian Crisan <gratian.cri...@ni.com> Cc: John Stultz <john.stu...@linaro.org>

[PATCH RFC] clocksource: Detect a watchdog overflow

2016-03-15 Thread Gratian Crisan
can represent without overflow and do not disqualify the main clocksource if the delta since the last time we have checked exceeds the measurement capabilities of the watchdog clocksource. Signed-off-by: Gratian Crisan Cc: John Stultz Cc: Thomas Gleixner --- kernel/time/clocksource.c | 12

[RFC][PATCH] clocksource: Detect a watchdog overflow

2016-03-03 Thread gratian . crisan
From: Gratian Crisan <gratian.cri...@ni.com> The clocksource watchdog can falsely trigger and disable the main clocksource when the watchdog warps. The reason is that an interrupt storm and/or high priority (FIFO/RR) tasks can preempt the timer softirq long enough for the watchdog t

[RFC][PATCH] clocksource: Detect a watchdog overflow

2016-03-03 Thread gratian . crisan
From: Gratian Crisan The clocksource watchdog can falsely trigger and disable the main clocksource when the watchdog warps. The reason is that an interrupt storm and/or high priority (FIFO/RR) tasks can preempt the timer softirq long enough for the watchdog to warp if it has a limited number

Re: [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error

2015-11-19 Thread Gratian Crisan
Peter Zijlstra writes: > On Wed, Nov 11, 2015 at 09:41:25AM -0600, Gratian Crisan wrote: >> I also wrote a small C utility[1], with a bit of code borrowed from the >> kernel, for reading the TSC on all CPUs. It starts a high priority >> thread per CPU, tries to synchroniz

Re: [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error

2015-11-19 Thread Gratian Crisan
Peter Zijlstra writes: > On Wed, Nov 11, 2015 at 09:41:25AM -0600, Gratian Crisan wrote: >> I also wrote a small C utility[1], with a bit of code borrowed from the >> kernel, for reading the TSC on all CPUs. It starts a high priority >> thread per CPU, tries to synchroniz

Re: [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error

2015-11-17 Thread Gratian Crisan
Dave Hansen writes: > On 11/09/2015 02:02 PM, Peter Zijlstra wrote: >> On Mon, Nov 09, 2015 at 01:59:02PM -0600, gratian.cri...@ni.com wrote: >>> The Intel Xeon E5 processor family suffers from errata[1] BT81: >> >>> +#ifdef CONFIG_X86_TSC >>> + /* >>> +* Xeon E5 BT81 errata: TSC is not

Re: [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error

2015-11-17 Thread Gratian Crisan
Peter Zijlstra writes: > On Wed, Nov 11, 2015 at 09:41:25AM -0600, Gratian Crisan wrote: >> I also wrote a small C utility[1], with a bit of code borrowed from the >> kernel, for reading the TSC on all CPUs. It starts a high priority >> thread per CPU, tries to synchroniz

Re: [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error

2015-11-17 Thread Gratian Crisan
Peter Zijlstra writes: > On Wed, Nov 11, 2015 at 09:41:25AM -0600, Gratian Crisan wrote: >> I also wrote a small C utility[1], with a bit of code borrowed from the >> kernel, for reading the TSC on all CPUs. It starts a high priority >> thread per CPU, tries to synchroniz

Re: [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error

2015-11-17 Thread Gratian Crisan
Dave Hansen writes: > On 11/09/2015 02:02 PM, Peter Zijlstra wrote: >> On Mon, Nov 09, 2015 at 01:59:02PM -0600, gratian.cri...@ni.com wrote: >>> The Intel Xeon E5 processor family suffers from errata[1] BT81: >> >>> +#ifdef CONFIG_X86_TSC >>> + /* >>> +* Xeon E5 BT81 errata: TSC is not

Re: [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error

2015-11-11 Thread Gratian Crisan
Josh Hunt writes: > On Tue, Nov 10, 2015 at 1:47 PM, Gratian Crisan wrote: >> >> The observed behavior does seem to match BT81 errata i.e. the TSC does >> not get reset on warm reboots and it is otherwise stable. >> > If you have a simple testcase to reproduce t

Re: [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error

2015-11-11 Thread Gratian Crisan
Josh Hunt writes: > On Tue, Nov 10, 2015 at 1:47 PM, Gratian Crisan <gratian.cri...@ni.com> wrote: >> >> The observed behavior does seem to match BT81 errata i.e. the TSC does >> not get reset on warm reboots and it is otherwise stable. >> > If you have a simp

Re: [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error

2015-11-10 Thread Gratian Crisan
Josh Hunt writes: > On Tue, Nov 10, 2015 at 12:24 PM, Josh Hunt wrote: >> >> On Mon, Nov 9, 2015 at 4:02 PM, Peter Zijlstra wrote: >>> >>> On Mon, Nov 09, 2015 at 01:59:02PM -0600, gratian.cri...@ni.com wrote: >>> >>> > The Intel Xeon E5 processor family suffers from errata[1] BT81: >>> >>> >

Re: [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error

2015-11-10 Thread Gratian Crisan
Josh Hunt writes: > On Tue, Nov 10, 2015 at 12:24 PM, Josh Hunt wrote: >> >> On Mon, Nov 9, 2015 at 4:02 PM, Peter Zijlstra wrote: >>> >>> On Mon, Nov 09, 2015 at 01:59:02PM -0600, gratian.cri...@ni.com wrote: >>> >>> > The Intel Xeon E5 processor

[RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error

2015-11-09 Thread gratian . crisan
From: Gratian Crisan The Intel Xeon E5 processor family suffers from errata[1] BT81: "TSC is Not Affected by Warm Reset. Problem: The TSC (Time Stamp Counter MSR 10H) should be cleared on reset. Due to this erratum the TSC is not affected by warm reset. Implication: The TSC is not cl

[RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error

2015-11-09 Thread gratian . crisan
From: Gratian Crisan <gratian.cri...@ni.com> The Intel Xeon E5 processor family suffers from errata[1] BT81: "TSC is Not Affected by Warm Reset. Problem: The TSC (Time Stamp Counter MSR 10H) should be cleared on reset. Due to this erratum the TSC is not affected by warm reset.