Re: [patch] Add allowed_affinity to the irq_desc to make it possible to have restricted irqs
> > the numa case of "I prefer that cpu" is handled. Not the "I cannot work on > > those". > > How is the NUMA case of I prefer that cpu handled? it's exported via /sys/bus/pci/devices//local_cpus (and the irq is in the /irq directory next to local_cpus) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Add allowed_affinity to the irq_desc to make it possible to have restricted irqs
Arjan van de Ven <[EMAIL PROTECTED]> writes: > Eric W. Biederman wrote: >>> 1) is very real today >>> 2) is partially real on some of the bigger numa stuff already. >> >> You have said you the NUMA cases is handled in another way already? > > the numa case of "I prefer that cpu" is handled. Not the "I cannot work on > those". How is the NUMA case of I prefer that cpu handled? Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Add allowed_affinity to the irq_desc to make it possible to have restricted irqs
Eric W. Biederman wrote: 1) is very real today 2) is partially real on some of the bigger numa stuff already. You have said you the NUMA cases is handled in another way already? the numa case of "I prefer that cpu" is handled. Not the "I cannot work on those". - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Add allowed_affinity to the irq_desc to make it possible to have restricted irqs
Arjan van de Ven <[EMAIL PROTECTED]> writes: > Eric W. Biederman wrote: >> What is the problem you are trying to solve? > > 2 problems > 1) irq's that irqbalance should not touch at all This is easy we just need a single bit. Not 128+ bytes on the huge machines. > 2) irqs that can only go to a subset of processors. > > 1) is very real today > 2) is partially real on some of the bigger numa stuff already. You have said you the NUMA cases is handled in another way already? In which case irqs that can only got to a subset of processors shouldn't be a problem. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Add allowed_affinity to the irq_desc to make it possible to have restricted irqs
Eric W. Biederman wrote: What is the problem you are trying to solve? 2 problems 1) irq's that irqbalance should not touch at all 2) irqs that can only go to a subset of processors. 1) is very real today 2) is partially real on some of the bigger numa stuff already. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Add allowed_affinity to the irq_desc to make it possible to have restricted irqs
On 12/13/06, Eric W. Biederman <[EMAIL PROTECTED]> wrote: Arjan van de Ven <[EMAIL PROTECTED]> writes: > the numa case is already handled; the needed info for that is exposed already > enough... at least for irqbalance What is the problem you are trying to solve? If it is just interrupts irqbalanced can't help with we can do that with a single bit. My basic problem with understanding what this patch is trying to solve is that I've seen some theoretical cases raised but I don't see the real world problem. Good question. irqbalance could use set_affinity from boot cpu to final cpu. YH - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Add allowed_affinity to the irq_desc to make it possible to have restricted irqs
Arjan van de Ven <[EMAIL PROTECTED]> writes: > Eric W. Biederman wrote: > >> There is still a question of how to handle the NUMA case but... >> > > the numa case is already handled; the needed info for that is exposed already > enough... at least for irqbalance What is the problem you are trying to solve? If it is just interrupts irqbalanced can't help with we can do that with a single bit. My basic problem with understanding what this patch is trying to solve is that I've seen some theoretical cases raised but I don't see the real world problem. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Add allowed_affinity to the irq_desc to make it possible to have restricted irqs
Eric W. Biederman wrote: There is still a question of how to handle the NUMA case but... the numa case is already handled; the needed info for that is exposed already enough... at least for irqbalance - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Add allowed_affinity to the irq_desc to make it possible to have restricted irqs
Arjan van de Ven <[EMAIL PROTECTED]> writes: >> . >> >> In addition the cases I can think of allowed_affinity is the wrong >> name. suggested_affinity sounds like what you are trying to implement >> and when it is merely a suggestion and not a hard limit it doesn't >> make sense to export like this. > > it really IS a hard limit. Ok. Which generally makes it uninteresting. The only cases that I know with a hard limit are completely unrouteable. In addition upon reflection you don't handle PER_CPU irqs properly. As I recall ia64 uses a different per cpu irq source to target each individual processor. But because they are the same the share the irq source. I don't think allowed_affinity can even describe the case above. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Add allowed_affinity to the irq_desc to make it possible to have restricted irqs
* Eric W. Biederman <[EMAIL PROTECTED]> wrote: > > also there might be hardware that can only route a given IRQ to a > > subset of CPUs. While setting set_affinity allows the > > irqbalance-daemon to 'probe' this mask, it's a far from optimal API. > > I agree, I am just arguing that adding another awkward interface to > the current situation does not really make the situation better, and > it increases our support burden. well, please suggest a better interface then. > For a bunch of this it is arguable that the way to go is simply to > parse the irq type in /proc/interrupts. All of the really weird cases > will have a distinct type there. This certainly captures the MSI-X > case. There is still a question of how to handle the NUMA case but... ... so parsing /proc/interrupts should be that interface? That is a historically very volatile interface. It's mostly human-parsed, and we frequently twiddle it - genirq changed it too. In v2.6.19 we had fasteio instead of fasteoi there. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Add allowed_affinity to the irq_desc to make it possible to have restricted irqs
Ingo Molnar <[EMAIL PROTECTED]> writes: > * Eric W. Biederman <[EMAIL PROTECTED]> wrote: > >> In addition the cases I can think of allowed_affinity is the wrong >> name. suggested_affinity sounds like what you are trying to implement >> and when it is merely a suggestion and not a hard limit it doesn't >> make sense to export like this. > > well, there are interrupts that must be tied to a single CPU and must > never be moved away. For example per-CPU clock-events-source interrupts > are such. So allowed_affinity very much exists. Although in that case since it is a single cpu there is a much more elegant implementation. We don't need a full cpumask_t to describe it. > also there might be hardware that can only route a given IRQ to a subset > of CPUs. While setting set_affinity allows the irqbalance-daemon to > 'probe' this mask, it's a far from optimal API. I agree, I am just arguing that adding another awkward interface to the current situation does not really make the situation better, and it increases our support burden. For a bunch of this it is arguable that the way to go is simply to parse the irq type in /proc/interrupts. All of the really weird cases will have a distinct type there. This certainly captures the MSI-X case. There is still a question of how to handle the NUMA case but... Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Add allowed_affinity to the irq_desc to make it possible to have restricted irqs
> . > > In addition the cases I can think of allowed_affinity is the wrong > name. suggested_affinity sounds like what you are trying to implement > and when it is merely a suggestion and not a hard limit it doesn't > make sense to export like this. it really IS a hard limit. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Add allowed_affinity to the irq_desc to make it possible to have restricted irqs
* Eric W. Biederman <[EMAIL PROTECTED]> wrote: > In addition the cases I can think of allowed_affinity is the wrong > name. suggested_affinity sounds like what you are trying to implement > and when it is merely a suggestion and not a hard limit it doesn't > make sense to export like this. well, there are interrupts that must be tied to a single CPU and must never be moved away. For example per-CPU clock-events-source interrupts are such. So allowed_affinity very much exists. also there might be hardware that can only route a given IRQ to a subset of CPUs. While setting set_affinity allows the irqbalance-daemon to 'probe' this mask, it's a far from optimal API. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Add allowed_affinity to the irq_desc to make it possible to have restricted irqs
Arjan van de Ven <[EMAIL PROTECTED]> writes: > [due to a broken libata in current -git I've not been able to test this patch > enough] > > > This patch adds an "allowed_affinity" mask to each interrupt, in addition to > the > existing actual affinity mask. In addition this new mask is exported to > userspace > in a similar way as the actual affinity is exported. (this is so that > irqbalance > can find out about the restriction and take it into account) > > The purpose for having this mask is to allow for the situation where > interrupts > just can't or shouldn't go to all cpus; one example is the "per cpu" IRQ > thing > that powerpc and others have. Another soon-to-come example is MSIX devices > that > can generate a different MSI interrupt for each cpu; in that case the MSI > needs > to > be strictly constrained to it's designated cpu. I don't like this. If we really have constraints that limit which cpus we can handle a irqs on we should be taking advantage of them to make the data structure smaller, not making it bigger. This feels like a forced fit. To fit the situations you describe into allowed_affinity you have to throw away information. Such as the fact it is a NUMA node or a MSI-X interrupt. In addition the cases I can think of allowed_affinity is the wrong name. suggested_affinity sounds like what you are trying to implement and when it is merely a suggestion and not a hard limit it doesn't make sense to export like this. The restriction with MSIX is that we want the irq that describes the flow of data and the user space process that consumes that flow of data scheduled on the same cpu. Yes this is mostly one per cpu but it is not strictly one per cpu so allowed cpus doesn't make sense in that case. The more I think about describing what we are schedule, we are scheduling a more or less periodic realtime process don't we already have schedulers in the kernel to schedule this kind of thing? Don't we want to take into consideration the cache penalty of migration before we decided to move the irq to another cpu? Anyway if we are going to start exporting this kind of information please let's export the whole context and export it in a way that applications besides irqbalance can take advantage of. Otherwise we are just committing to support a wide kernel/user space interface for the purposes of a single application and that seems ridiculous. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] Add allowed_affinity to the irq_desc to make it possible to have restricted irqs
[due to a broken libata in current -git I've not been able to test this patch enough] This patch adds an "allowed_affinity" mask to each interrupt, in addition to the existing actual affinity mask. In addition this new mask is exported to userspace in a similar way as the actual affinity is exported. (this is so that irqbalance can find out about the restriction and take it into account) The purpose for having this mask is to allow for the situation where interrupts just can't or shouldn't go to all cpus; one example is the "per cpu" IRQ thing that powerpc and others have. Another soon-to-come example is MSIX devices that can generate a different MSI interrupt for each cpu; in that case the MSI needs to be strictly constrained to it's designated cpu. Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]> --- include/linux/irq.h |2 ++ kernel/irq/chip.c |1 + kernel/irq/handle.c |3 ++- kernel/irq/manage.c |5 - kernel/irq/proc.c | 27 +++ 5 files changed, 36 insertions(+), 2 deletions(-) Index: linux-2.6/include/linux/irq.h === --- linux-2.6.orig/include/linux/irq.h +++ linux-2.6/include/linux/irq.h @@ -137,6 +137,7 @@ struct irq_chip { * @irqs_unhandled:stats field for spurious unhandled interrupts * @lock: locking for SMP * @affinity: IRQ affinity on SMP + * @allowed_affinity The allowed affinity for this IRQ * @cpu: cpu index useful for balancing * @pending_mask: pending rebalanced interrupts * @dir: /proc/irq/ procfs entry @@ -160,6 +161,7 @@ struct irq_desc { spinlock_t lock; #ifdef CONFIG_SMP cpumask_t affinity; + cpumask_t allowed_affinity; unsigned intcpu; #endif #if defined(CONFIG_GENERIC_PENDING_IRQ) || defined(CONFIG_IRQBALANCE) Index: linux-2.6/kernel/irq/chip.c === --- linux-2.6.orig/kernel/irq/chip.c +++ linux-2.6/kernel/irq/chip.c @@ -46,6 +46,7 @@ void dynamic_irq_init(unsigned int irq) desc->irqs_unhandled = 0; #ifdef CONFIG_SMP desc->affinity = CPU_MASK_ALL; + desc->allowed_affinity = CPU_MASK_ALL; #endif spin_unlock_irqrestore(&desc->lock, flags); } Index: linux-2.6/kernel/irq/handle.c === --- linux-2.6.orig/kernel/irq/handle.c +++ linux-2.6/kernel/irq/handle.c @@ -56,7 +56,8 @@ struct irq_desc irq_desc[NR_IRQS] __cach .depth = 1, .lock = __SPIN_LOCK_UNLOCKED(irq_desc->lock), #ifdef CONFIG_SMP - .affinity = CPU_MASK_ALL + .affinity = CPU_MASK_ALL, + .allowed_affinity = CPU_MASK_ALL #endif } }; Index: linux-2.6/kernel/irq/manage.c === --- linux-2.6.orig/kernel/irq/manage.c +++ linux-2.6/kernel/irq/manage.c @@ -278,8 +278,13 @@ int setup_irq(unsigned int irq, struct i *p = new; #if defined(CONFIG_IRQ_PER_CPU) - if (new->flags & IRQF_PERCPU) + if (new->flags & IRQF_PERCPU) { desc->status |= IRQ_PER_CPU; + /* don't allow affinity to be set for per cpu interrupts */ +#ifdef CONFIG_SMP + desc->allowed_affinity = CPU_MASK_NONE; +#endif + } #endif if (!shared) { irq_chip_set_defaults(desc->chip); Index: linux-2.6/kernel/irq/proc.c === --- linux-2.6.orig/kernel/irq/proc.c +++ linux-2.6/kernel/irq/proc.c @@ -47,6 +47,20 @@ static int irq_affinity_read_proc(char * return len; } + +static int irq_affinity_read_allowed_proc(char *page, char **start, off_t off, + int count, int *eof, void *data) +{ + int len = cpumask_scnprintf(page, count, irq_desc[(long)data].allowed_affinity); + + if (count - len < 2) + return -EINVAL; + len += sprintf(page + len, "\n"); + return len; +} + + + int no_irq_affinity; static int irq_affinity_write_proc(struct file *file, const char __user *buffer, unsigned long count, void *data) @@ -62,6 +76,9 @@ static int irq_affinity_write_proc(struc if (err) return err; + /* mask off the allowed_affinity mask to only leave legal cpus */ + cpus_and(new_value, new_value, irq_desc[irq].allowed_affinity); + /* * Do not allow disabling IRQs completely - it's a too easy * way to make the system unusable accidentally :-) At least @@ -141,6 +158,16 @@ void register_irq_proc(unsigned int irq) entry->read_proc = irq_affinity_read_proc; entry->write_proc = irq_affinity_write_proc; } + + /* cre