Re: IRQ affinity notifiers vs RT
On 10/01/2013 01:44 AM, Ben Hutchings wrote: > Right, but it looks quite strange to have this thread just for a > (probably) quite rare event. Maybe some day someone will work out how > to make Octeon's IRQ management less unusual and then you can use the > simpler approach for RT... We have three threads for some rare events by now if I count correctly. That is - mce check on x86 (mce_notify_helper_thread) - this one - clock_was_set_delayed() (pending) So right now I am thinking about a workqueue for rare events because if I keep doing this we end up with a bunch useless tasks for something that might never occur. > Ben. Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IRQ affinity notifiers vs RT
On 10/01/2013 01:44 AM, Ben Hutchings wrote: Right, but it looks quite strange to have this thread just for a (probably) quite rare event. Maybe some day someone will work out how to make Octeon's IRQ management less unusual and then you can use the simpler approach for RT... We have three threads for some rare events by now if I count correctly. That is - mce check on x86 (mce_notify_helper_thread) - this one - clock_was_set_delayed() (pending) So right now I am thinking about a workqueue for rare events because if I keep doing this we end up with a bunch useless tasks for something that might never occur. Ben. Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IRQ affinity notifiers vs RT
On Mon, 2013-09-23 at 14:58 +0200, Sebastian Andrzej Siewior wrote: > On 08/30/2013 11:29 PM, Ben Hutchings wrote: > > Sebastian, I saw you came up with a fix for this but apparently without > > seeing my earlier message: > > Yes Ben, I haven't seen it. If I was on Cc I very sorry for overlooking > it. You weren't, as I didn't realise you were maintaining the RT patch set then. > > On Thu, 2013-07-25 at 00:31 +0100, Ben Hutchings wrote: > > >> Workqueue code uses spin_lock_irq() on the workqueue lock, which with > >> PREEMPT_RT enabled doesn't actually block IRQs. > >> > >> In 3.6, the irq_cpu_rmap functions relies on a workqueue flush to finish > >> any outstanding notifications before freeing the cpu_rmap that they use. > >> This won't be reliable if the notification is scheduled after releasing > >> the irq_desc lock. > >> > >> However, following commit 896f97ea95c1 ('lib: cpu_rmap: avoid flushing > >> all workqueues') in 3.8, I think that it is sufficient to do only > >> kref_get(>affinity_notify->kref) in __irq_set_affinity_locked() > >> and then call schedule_work() in irq_set_affinity() after releasing the > >> lock. Something like this (untested): > > > > Does the following make sense to you? > > This was suggested by the original submitter on rt-users@v.k.o (Joe > Korty) where I've been made aware of this for the first time. This okay > except for the part where the workqueue is not scheduled if calling by > the __ function (i.e. the mips case). If I read the code correctly, the > CPU goes offline and affinity change should be updated / users notified > and this is not the case with this patch. > > It is a valid question why only one mips SoC needs the function. If you > look at commit 0c3263870f ("MIPS: Octeon: Rewrite interrupt handling > code.") you can see that tglx himself made this adjustment so it might > be valid :) Therefore I assume that we may get more callers of this > function and the workqueue should be executed and I made something > simple that works on RT. Right, but it looks quite strange to have this thread just for a (probably) quite rare event. Maybe some day someone will work out how to make Octeon's IRQ management less unusual and then you can use the simpler approach for RT... Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IRQ affinity notifiers vs RT
On Mon, 2013-09-23 at 14:58 +0200, Sebastian Andrzej Siewior wrote: On 08/30/2013 11:29 PM, Ben Hutchings wrote: Sebastian, I saw you came up with a fix for this but apparently without seeing my earlier message: Yes Ben, I haven't seen it. If I was on Cc I very sorry for overlooking it. You weren't, as I didn't realise you were maintaining the RT patch set then. On Thu, 2013-07-25 at 00:31 +0100, Ben Hutchings wrote: Workqueue code uses spin_lock_irq() on the workqueue lock, which with PREEMPT_RT enabled doesn't actually block IRQs. In 3.6, the irq_cpu_rmap functions relies on a workqueue flush to finish any outstanding notifications before freeing the cpu_rmap that they use. This won't be reliable if the notification is scheduled after releasing the irq_desc lock. However, following commit 896f97ea95c1 ('lib: cpu_rmap: avoid flushing all workqueues') in 3.8, I think that it is sufficient to do only kref_get(desc-affinity_notify-kref) in __irq_set_affinity_locked() and then call schedule_work() in irq_set_affinity() after releasing the lock. Something like this (untested): Does the following make sense to you? This was suggested by the original submitter on rt-users@v.k.o (Joe Korty) where I've been made aware of this for the first time. This okay except for the part where the workqueue is not scheduled if calling by the __ function (i.e. the mips case). If I read the code correctly, the CPU goes offline and affinity change should be updated / users notified and this is not the case with this patch. It is a valid question why only one mips SoC needs the function. If you look at commit 0c3263870f (MIPS: Octeon: Rewrite interrupt handling code.) you can see that tglx himself made this adjustment so it might be valid :) Therefore I assume that we may get more callers of this function and the workqueue should be executed and I made something simple that works on RT. Right, but it looks quite strange to have this thread just for a (probably) quite rare event. Maybe some day someone will work out how to make Octeon's IRQ management less unusual and then you can use the simpler approach for RT... Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IRQ affinity notifiers vs RT
On 08/30/2013 11:29 PM, Ben Hutchings wrote: > Sebastian, I saw you came up with a fix for this but apparently without > seeing my earlier message: Yes Ben, I haven't seen it. If I was on Cc I very sorry for overlooking it. > On Thu, 2013-07-25 at 00:31 +0100, Ben Hutchings wrote: >> Workqueue code uses spin_lock_irq() on the workqueue lock, which with >> PREEMPT_RT enabled doesn't actually block IRQs. >> >> In 3.6, the irq_cpu_rmap functions relies on a workqueue flush to finish >> any outstanding notifications before freeing the cpu_rmap that they use. >> This won't be reliable if the notification is scheduled after releasing >> the irq_desc lock. >> >> However, following commit 896f97ea95c1 ('lib: cpu_rmap: avoid flushing >> all workqueues') in 3.8, I think that it is sufficient to do only >> kref_get(>affinity_notify->kref) in __irq_set_affinity_locked() >> and then call schedule_work() in irq_set_affinity() after releasing the >> lock. Something like this (untested): > > Does the following make sense to you? This was suggested by the original submitter on rt-users@v.k.o (Joe Korty) where I've been made aware of this for the first time. This okay except for the part where the workqueue is not scheduled if calling by the __ function (i.e. the mips case). If I read the code correctly, the CPU goes offline and affinity change should be updated / users notified and this is not the case with this patch. It is a valid question why only one mips SoC needs the function. If you look at commit 0c3263870f ("MIPS: Octeon: Rewrite interrupt handling code.") you can see that tglx himself made this adjustment so it might be valid :) Therefore I assume that we may get more callers of this function and the workqueue should be executed and I made something simple that works on RT. > > Ben. > Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IRQ affinity notifiers vs RT
On 08/30/2013 11:29 PM, Ben Hutchings wrote: Sebastian, I saw you came up with a fix for this but apparently without seeing my earlier message: Yes Ben, I haven't seen it. If I was on Cc I very sorry for overlooking it. On Thu, 2013-07-25 at 00:31 +0100, Ben Hutchings wrote: Workqueue code uses spin_lock_irq() on the workqueue lock, which with PREEMPT_RT enabled doesn't actually block IRQs. In 3.6, the irq_cpu_rmap functions relies on a workqueue flush to finish any outstanding notifications before freeing the cpu_rmap that they use. This won't be reliable if the notification is scheduled after releasing the irq_desc lock. However, following commit 896f97ea95c1 ('lib: cpu_rmap: avoid flushing all workqueues') in 3.8, I think that it is sufficient to do only kref_get(desc-affinity_notify-kref) in __irq_set_affinity_locked() and then call schedule_work() in irq_set_affinity() after releasing the lock. Something like this (untested): Does the following make sense to you? This was suggested by the original submitter on rt-users@v.k.o (Joe Korty) where I've been made aware of this for the first time. This okay except for the part where the workqueue is not scheduled if calling by the __ function (i.e. the mips case). If I read the code correctly, the CPU goes offline and affinity change should be updated / users notified and this is not the case with this patch. It is a valid question why only one mips SoC needs the function. If you look at commit 0c3263870f (MIPS: Octeon: Rewrite interrupt handling code.) you can see that tglx himself made this adjustment so it might be valid :) Therefore I assume that we may get more callers of this function and the workqueue should be executed and I made something simple that works on RT. Ben. Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IRQ affinity notifiers vs RT
Sebastian, I saw you came up with a fix for this but apparently without seeing my earlier message: On Thu, 2013-07-25 at 00:31 +0100, Ben Hutchings wrote: > Alexandra Kossovsky reported the following lockdep splat when testing an > out-of-tree version of sfc on 3.6-rt. The problem is specific to RT, > and we haven't tested anything later but I think it's still unfixed. > > > == > > [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ] > > 3.6.11.2-rt33.39.el6rt.x86_64.debug #1 Tainted: G O > > -- > > insmod/3076 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: > > (&(&(>lock)->lock)->wait_lock){+.+...}, at: [] > > rt_spin_lock_slowlock+0x48/0x2f0 > > > > and this task is already holding: > > (_desc_lock_class){-.}, at: [] > > irq_set_affinity+0x46/0x80 > > irq_set_affinity() holds the irq_desc lock, and then schedules a work > item to call the IRQ affinity notifier. > > > which would create a new lock dependency: > > (_desc_lock_class){-.} -> > > (&(&(>lock)->lock)->wait_lock){+.+...} > > > > but this new dependency connects a HARDIRQ-irq-safe lock: > > (_desc_lock_class){-.} > > ... which became HARDIRQ-irq-safe at: > > [] mark_irqflags+0x172/0x190 > > [] __lock_acquire+0x344/0x4e0 > > [] lock_acquire+0x8a/0x140 > > [] _raw_spin_lock+0x40/0x80 > > [] handle_level_irq+0x1e/0x100 > > [] handle_irq+0x71/0x190 > > [] do_IRQ+0x5d/0xe0 > > [] ret_from_intr+0x0/0x13 > > [] tsc_init+0x24/0x102 > > [] x86_late_time_init+0xf/0x11 > > [] start_kernel+0x312/0x3c6 > > [] x86_64_start_reservations+0x131/0x136 > > [] x86_64_start_kernel+0xed/0xf4 > > Obviously, irq_desc is used in hard-IRQ context. > > > to a HARDIRQ-irq-unsafe lock: > > (&(&(>lock)->lock)->wait_lock){+.+...} > > ... which became HARDIRQ-irq-unsafe at: > > ... [] mark_irqflags+0x120/0x190 > > [] __lock_acquire+0x344/0x4e0 > > [] lock_acquire+0x8a/0x140 > > [] _raw_spin_lock+0x40/0x80 > > [] rt_spin_lock_slowlock+0x48/0x2f0 > > [] rt_spin_lock+0x16/0x40 > > [] create_worker+0x69/0x220 > > [] init_workqueues+0x24b/0x3f8 > > [] do_one_initcall+0x42/0x170 > > [] kernel_init+0xe5/0x17a > > [] kernel_thread_helper+0x4/0x10 > [...] > > Workqueue code uses spin_lock_irq() on the workqueue lock, which with > PREEMPT_RT enabled doesn't actually block IRQs. > > In 3.6, the irq_cpu_rmap functions relies on a workqueue flush to finish > any outstanding notifications before freeing the cpu_rmap that they use. > This won't be reliable if the notification is scheduled after releasing > the irq_desc lock. > > However, following commit 896f97ea95c1 ('lib: cpu_rmap: avoid flushing > all workqueues') in 3.8, I think that it is sufficient to do only > kref_get(>affinity_notify->kref) in __irq_set_affinity_locked() > and then call schedule_work() in irq_set_affinity() after releasing the > lock. Something like this (untested): Does the following make sense to you? Ben. > diff --git a/arch/mips/cavium-octeon/octeon-irq.c > b/arch/mips/cavium-octeon/octeon-irq.c > index 9d36774..0406481 100644 > --- a/arch/mips/cavium-octeon/octeon-irq.c > +++ b/arch/mips/cavium-octeon/octeon-irq.c > @@ -635,7 +635,8 @@ static void octeon_irq_cpu_offline_ciu(struct irq_data > *data) > cpumask_clear(_affinity); > cpumask_set_cpu(cpumask_first(cpu_online_mask), _affinity); > } > - __irq_set_affinity_locked(data, _affinity); > + /* XXX No-one else calls this; why does this chip need it? */ > + __irq_set_affinity_locked(data, _affinity, NULL); > } > > static int octeon_irq_ciu_set_affinity(struct irq_data *data, > diff --git a/include/linux/irq.h b/include/linux/irq.h > index f04d3ba..de992f4 100644 > --- a/include/linux/irq.h > +++ b/include/linux/irq.h > @@ -380,7 +380,9 @@ extern void remove_percpu_irq(unsigned int irq, struct > irqaction *act); > > extern void irq_cpu_online(void); > extern void irq_cpu_offline(void); > -extern int __irq_set_affinity_locked(struct irq_data *data, const struct > cpumask *cpumask); > +extern int __irq_set_affinity_locked(struct irq_data *data, > + const struct cpumask *cpumask, > + struct irq_affinity_notify **notify); > > #ifdef CONFIG_GENERIC_HARDIRQS > > diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c > index 514bcfd..157afa2 100644 > --- a/kernel/irq/manage.c > +++ b/kernel/irq/manage.c > @@ -162,12 +162,16 @@ int irq_do_set_affinity(struct irq_data *data, const > struct cpumask *mask, > return ret; > } > > -int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask > *mask) > +int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask > *mask, > + struct irq_affinity_notify **notify) > { > struct irq_chip *chip = irq_data_get_irq_chip(data); >
Re: IRQ affinity notifiers vs RT
Sebastian, I saw you came up with a fix for this but apparently without seeing my earlier message: On Thu, 2013-07-25 at 00:31 +0100, Ben Hutchings wrote: Alexandra Kossovsky reported the following lockdep splat when testing an out-of-tree version of sfc on 3.6-rt. The problem is specific to RT, and we haven't tested anything later but I think it's still unfixed. == [ INFO: HARDIRQ-safe - HARDIRQ-unsafe lock order detected ] 3.6.11.2-rt33.39.el6rt.x86_64.debug #1 Tainted: G O -- insmod/3076 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: (((gcwq-lock)-lock)-wait_lock){+.+...}, at: [81589a78] rt_spin_lock_slowlock+0x48/0x2f0 and this task is already holding: (irq_desc_lock_class){-.}, at: [810ec226] irq_set_affinity+0x46/0x80 irq_set_affinity() holds the irq_desc lock, and then schedules a work item to call the IRQ affinity notifier. which would create a new lock dependency: (irq_desc_lock_class){-.} - (((gcwq-lock)-lock)-wait_lock){+.+...} but this new dependency connects a HARDIRQ-irq-safe lock: (irq_desc_lock_class){-.} ... which became HARDIRQ-irq-safe at: [810adba2] mark_irqflags+0x172/0x190 [810af2c4] __lock_acquire+0x344/0x4e0 [810af4ea] lock_acquire+0x8a/0x140 [8158ac50] _raw_spin_lock+0x40/0x80 [810edfae] handle_level_irq+0x1e/0x100 [810044e1] handle_irq+0x71/0x190 [815943ad] do_IRQ+0x5d/0xe0 [8158b32c] ret_from_intr+0x0/0x13 [81d16d9d] tsc_init+0x24/0x102 [81d13d77] x86_late_time_init+0xf/0x11 [81d10cfe] start_kernel+0x312/0x3c6 [81d1032d] x86_64_start_reservations+0x131/0x136 [81d1041f] x86_64_start_kernel+0xed/0xf4 Obviously, irq_desc is used in hard-IRQ context. to a HARDIRQ-irq-unsafe lock: (((gcwq-lock)-lock)-wait_lock){+.+...} ... which became HARDIRQ-irq-unsafe at: ... [810adb50] mark_irqflags+0x120/0x190 [810af2c4] __lock_acquire+0x344/0x4e0 [810af4ea] lock_acquire+0x8a/0x140 [8158ac50] _raw_spin_lock+0x40/0x80 [81589a78] rt_spin_lock_slowlock+0x48/0x2f0 [8158a1b6] rt_spin_lock+0x16/0x40 [8106cf99] create_worker+0x69/0x220 [81d2bae5] init_workqueues+0x24b/0x3f8 [810001c2] do_one_initcall+0x42/0x170 [81d10775] kernel_init+0xe5/0x17a [81593c24] kernel_thread_helper+0x4/0x10 [...] Workqueue code uses spin_lock_irq() on the workqueue lock, which with PREEMPT_RT enabled doesn't actually block IRQs. In 3.6, the irq_cpu_rmap functions relies on a workqueue flush to finish any outstanding notifications before freeing the cpu_rmap that they use. This won't be reliable if the notification is scheduled after releasing the irq_desc lock. However, following commit 896f97ea95c1 ('lib: cpu_rmap: avoid flushing all workqueues') in 3.8, I think that it is sufficient to do only kref_get(desc-affinity_notify-kref) in __irq_set_affinity_locked() and then call schedule_work() in irq_set_affinity() after releasing the lock. Something like this (untested): Does the following make sense to you? Ben. diff --git a/arch/mips/cavium-octeon/octeon-irq.c b/arch/mips/cavium-octeon/octeon-irq.c index 9d36774..0406481 100644 --- a/arch/mips/cavium-octeon/octeon-irq.c +++ b/arch/mips/cavium-octeon/octeon-irq.c @@ -635,7 +635,8 @@ static void octeon_irq_cpu_offline_ciu(struct irq_data *data) cpumask_clear(new_affinity); cpumask_set_cpu(cpumask_first(cpu_online_mask), new_affinity); } - __irq_set_affinity_locked(data, new_affinity); + /* XXX No-one else calls this; why does this chip need it? */ + __irq_set_affinity_locked(data, new_affinity, NULL); } static int octeon_irq_ciu_set_affinity(struct irq_data *data, diff --git a/include/linux/irq.h b/include/linux/irq.h index f04d3ba..de992f4 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -380,7 +380,9 @@ extern void remove_percpu_irq(unsigned int irq, struct irqaction *act); extern void irq_cpu_online(void); extern void irq_cpu_offline(void); -extern int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *cpumask); +extern int __irq_set_affinity_locked(struct irq_data *data, + const struct cpumask *cpumask, + struct irq_affinity_notify **notify); #ifdef CONFIG_GENERIC_HARDIRQS diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 514bcfd..157afa2 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -162,12 +162,16 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, return ret; } -int
IRQ affinity notifiers vs RT
Alexandra Kossovsky reported the following lockdep splat when testing an out-of-tree version of sfc on 3.6-rt. The problem is specific to RT, and we haven't tested anything later but I think it's still unfixed. > == > [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ] > 3.6.11.2-rt33.39.el6rt.x86_64.debug #1 Tainted: G O > -- > insmod/3076 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: > (&(&(>lock)->lock)->wait_lock){+.+...}, at: [] > rt_spin_lock_slowlock+0x48/0x2f0 > > and this task is already holding: > (_desc_lock_class){-.}, at: [] > irq_set_affinity+0x46/0x80 irq_set_affinity() holds the irq_desc lock, and then schedules a work item to call the IRQ affinity notifier. > which would create a new lock dependency: > (_desc_lock_class){-.} -> > (&(&(>lock)->lock)->wait_lock){+.+...} > > but this new dependency connects a HARDIRQ-irq-safe lock: > (_desc_lock_class){-.} > ... which became HARDIRQ-irq-safe at: > [] mark_irqflags+0x172/0x190 > [] __lock_acquire+0x344/0x4e0 > [] lock_acquire+0x8a/0x140 > [] _raw_spin_lock+0x40/0x80 > [] handle_level_irq+0x1e/0x100 > [] handle_irq+0x71/0x190 > [] do_IRQ+0x5d/0xe0 > [] ret_from_intr+0x0/0x13 > [] tsc_init+0x24/0x102 > [] x86_late_time_init+0xf/0x11 > [] start_kernel+0x312/0x3c6 > [] x86_64_start_reservations+0x131/0x136 > [] x86_64_start_kernel+0xed/0xf4 Obviously, irq_desc is used in hard-IRQ context. > to a HARDIRQ-irq-unsafe lock: > (&(&(>lock)->lock)->wait_lock){+.+...} > ... which became HARDIRQ-irq-unsafe at: > ... [] mark_irqflags+0x120/0x190 > [] __lock_acquire+0x344/0x4e0 > [] lock_acquire+0x8a/0x140 > [] _raw_spin_lock+0x40/0x80 > [] rt_spin_lock_slowlock+0x48/0x2f0 > [] rt_spin_lock+0x16/0x40 > [] create_worker+0x69/0x220 > [] init_workqueues+0x24b/0x3f8 > [] do_one_initcall+0x42/0x170 > [] kernel_init+0xe5/0x17a > [] kernel_thread_helper+0x4/0x10 [...] Workqueue code uses spin_lock_irq() on the workqueue lock, which with PREEMPT_RT enabled doesn't actually block IRQs. In 3.6, the irq_cpu_rmap functions relies on a workqueue flush to finish any outstanding notifications before freeing the cpu_rmap that they use. This won't be reliable if the notification is scheduled after releasing the irq_desc lock. However, following commit 896f97ea95c1 ('lib: cpu_rmap: avoid flushing all workqueues') in 3.8, I think that it is sufficient to do only kref_get(>affinity_notify->kref) in __irq_set_affinity_locked() and then call schedule_work() in irq_set_affinity() after releasing the lock. Something like this (untested): diff --git a/arch/mips/cavium-octeon/octeon-irq.c b/arch/mips/cavium-octeon/octeon-irq.c index 9d36774..0406481 100644 --- a/arch/mips/cavium-octeon/octeon-irq.c +++ b/arch/mips/cavium-octeon/octeon-irq.c @@ -635,7 +635,8 @@ static void octeon_irq_cpu_offline_ciu(struct irq_data *data) cpumask_clear(_affinity); cpumask_set_cpu(cpumask_first(cpu_online_mask), _affinity); } - __irq_set_affinity_locked(data, _affinity); + /* XXX No-one else calls this; why does this chip need it? */ + __irq_set_affinity_locked(data, _affinity, NULL); } static int octeon_irq_ciu_set_affinity(struct irq_data *data, diff --git a/include/linux/irq.h b/include/linux/irq.h index f04d3ba..de992f4 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -380,7 +380,9 @@ extern void remove_percpu_irq(unsigned int irq, struct irqaction *act); extern void irq_cpu_online(void); extern void irq_cpu_offline(void); -extern int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *cpumask); +extern int __irq_set_affinity_locked(struct irq_data *data, +const struct cpumask *cpumask, +struct irq_affinity_notify **notify); #ifdef CONFIG_GENERIC_HARDIRQS diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 514bcfd..157afa2 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -162,12 +162,16 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, return ret; } -int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask) +int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask, + struct irq_affinity_notify **notify) { struct irq_chip *chip = irq_data_get_irq_chip(data); struct irq_desc *desc = irq_data_to_desc(data); int ret = 0; + if (notify) + *notify = NULL; + if (!chip || !chip->irq_set_affinity) return -EINVAL; @@ -178,9 +182,9 @@ int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask) irq_copy_pending(desc, mask); } - if (desc->affinity_notify) { + if (notify
IRQ affinity notifiers vs RT
Alexandra Kossovsky reported the following lockdep splat when testing an out-of-tree version of sfc on 3.6-rt. The problem is specific to RT, and we haven't tested anything later but I think it's still unfixed. == [ INFO: HARDIRQ-safe - HARDIRQ-unsafe lock order detected ] 3.6.11.2-rt33.39.el6rt.x86_64.debug #1 Tainted: G O -- insmod/3076 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: (((gcwq-lock)-lock)-wait_lock){+.+...}, at: [81589a78] rt_spin_lock_slowlock+0x48/0x2f0 and this task is already holding: (irq_desc_lock_class){-.}, at: [810ec226] irq_set_affinity+0x46/0x80 irq_set_affinity() holds the irq_desc lock, and then schedules a work item to call the IRQ affinity notifier. which would create a new lock dependency: (irq_desc_lock_class){-.} - (((gcwq-lock)-lock)-wait_lock){+.+...} but this new dependency connects a HARDIRQ-irq-safe lock: (irq_desc_lock_class){-.} ... which became HARDIRQ-irq-safe at: [810adba2] mark_irqflags+0x172/0x190 [810af2c4] __lock_acquire+0x344/0x4e0 [810af4ea] lock_acquire+0x8a/0x140 [8158ac50] _raw_spin_lock+0x40/0x80 [810edfae] handle_level_irq+0x1e/0x100 [810044e1] handle_irq+0x71/0x190 [815943ad] do_IRQ+0x5d/0xe0 [8158b32c] ret_from_intr+0x0/0x13 [81d16d9d] tsc_init+0x24/0x102 [81d13d77] x86_late_time_init+0xf/0x11 [81d10cfe] start_kernel+0x312/0x3c6 [81d1032d] x86_64_start_reservations+0x131/0x136 [81d1041f] x86_64_start_kernel+0xed/0xf4 Obviously, irq_desc is used in hard-IRQ context. to a HARDIRQ-irq-unsafe lock: (((gcwq-lock)-lock)-wait_lock){+.+...} ... which became HARDIRQ-irq-unsafe at: ... [810adb50] mark_irqflags+0x120/0x190 [810af2c4] __lock_acquire+0x344/0x4e0 [810af4ea] lock_acquire+0x8a/0x140 [8158ac50] _raw_spin_lock+0x40/0x80 [81589a78] rt_spin_lock_slowlock+0x48/0x2f0 [8158a1b6] rt_spin_lock+0x16/0x40 [8106cf99] create_worker+0x69/0x220 [81d2bae5] init_workqueues+0x24b/0x3f8 [810001c2] do_one_initcall+0x42/0x170 [81d10775] kernel_init+0xe5/0x17a [81593c24] kernel_thread_helper+0x4/0x10 [...] Workqueue code uses spin_lock_irq() on the workqueue lock, which with PREEMPT_RT enabled doesn't actually block IRQs. In 3.6, the irq_cpu_rmap functions relies on a workqueue flush to finish any outstanding notifications before freeing the cpu_rmap that they use. This won't be reliable if the notification is scheduled after releasing the irq_desc lock. However, following commit 896f97ea95c1 ('lib: cpu_rmap: avoid flushing all workqueues') in 3.8, I think that it is sufficient to do only kref_get(desc-affinity_notify-kref) in __irq_set_affinity_locked() and then call schedule_work() in irq_set_affinity() after releasing the lock. Something like this (untested): diff --git a/arch/mips/cavium-octeon/octeon-irq.c b/arch/mips/cavium-octeon/octeon-irq.c index 9d36774..0406481 100644 --- a/arch/mips/cavium-octeon/octeon-irq.c +++ b/arch/mips/cavium-octeon/octeon-irq.c @@ -635,7 +635,8 @@ static void octeon_irq_cpu_offline_ciu(struct irq_data *data) cpumask_clear(new_affinity); cpumask_set_cpu(cpumask_first(cpu_online_mask), new_affinity); } - __irq_set_affinity_locked(data, new_affinity); + /* XXX No-one else calls this; why does this chip need it? */ + __irq_set_affinity_locked(data, new_affinity, NULL); } static int octeon_irq_ciu_set_affinity(struct irq_data *data, diff --git a/include/linux/irq.h b/include/linux/irq.h index f04d3ba..de992f4 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -380,7 +380,9 @@ extern void remove_percpu_irq(unsigned int irq, struct irqaction *act); extern void irq_cpu_online(void); extern void irq_cpu_offline(void); -extern int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *cpumask); +extern int __irq_set_affinity_locked(struct irq_data *data, +const struct cpumask *cpumask, +struct irq_affinity_notify **notify); #ifdef CONFIG_GENERIC_HARDIRQS diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 514bcfd..157afa2 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -162,12 +162,16 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, return ret; } -int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask) +int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask, + struct irq_affinity_notify **notify) { struct irq_chip *chip = irq_data_get_irq_chip(data); struct irq_desc *desc =