Re: IRQ handling difference between i386 and x86_64
Chris Snook <[EMAIL PROTECTED]> writes: > Krzysztof Oledzki wrote: >> >> >> On Sat, 30 Jun 2007, Arjan van de Ven wrote: >> >>> On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote: Hello, It seems that IRQ handling is somehow different between i386 and x86_64. In my Dell PowerEdge 1950 is it possible to enable interrupts spreading over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 Xeon) so I think that interrupts migration may be useful. Unfortunately, it works only with 32-bit kernel. Booting it with x86_64 leads to situation, when all interrupts goes only to the first cpu matching a smp_affinity mask. >>> >>> arguably that is the most efficient behavior... round robin of >>> interrupts is the worst possible case in terms of performance >> >> Even on dual/quadro core CPUs with shared cache? So why it is possible to >> enable such behaviuor in BIOS, which works only on i386 BTW. :( >> >>> are you using irqbalance ? (www.irqbalance.org) >> >> Yes, I'm aware about this useful tool, but in some situations (routing for >> example) it cannot help much as it keeps three cpus idle. :( >> >> Best regards, >> >> Krzysztof Olędzki > > Interleaving interrupt delivery will completely break TCP header prediction, > and > cost you far more CPU time than it will save. In fact, because of the > locking, > it will probably scale negatively with the number of CPUs, if your workload is > mostly TCP/IP processing. The way around this is to ensure that the packets > for > any given TCP socket are all delivered to the same processor. If you have > multiple NICs and use 802.3ad bonding with layer3+4 hashing, header prediction > will work fine, and you don't have to disable irqbalance, because it will do > the > right thing. Regardless mostly this appears to be a case of running in a different ioapic mode, or possibly the software irq balance logic (as arch/x86_64) does not have irqbalance in the kernel. Hardware does automatic balancing in lowest-priority logical delivery mode. All balancing must be done in software in flat physical delivery mode. I'm guessing do to compile options etc. You are using flat physical delivery mode on x86_64. If the BIOS has an option to mess with this you are probably playing with as system that has known ioapic related hardware bugs and enabling it in the BIOS likely is enabling hardware problems. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IRQ handling difference between i386 and x86_64
Arjan van de Ven wrote: > On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote: >> Hello, >> >> It seems that IRQ handling is somehow different between i386 and x86_64. >> >> In my Dell PowerEdge 1950 is it possible to enable interrupts spreading >> over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 Xeon) >> so I think that interrupts migration may be useful. Unfortunately, it >> works only with 32-bit kernel. Booting it with x86_64 leads to situation, >> when all interrupts goes only to the first cpu matching a smp_affinity >> mask. > > arguably that is the most efficient behavior... round robin of > interrupts is the worst possible case in terms of performance > > are you using irqbalance ? (www.irqbalance.org) If you have not been using irqbalance, then setting more bits in irq/smp_affinity will utilize hardware APIC routing. Sending interrupt to more than one CPU will work in flat mode only. In 32 bit kernel the IOAPIC is configured in flat mode while in 64 bit it is in physical flat mode. Physical flat mode will support interrupt routing to only one CPU. 32-bit: Enabling APIC mode: Flat. Using 2 I/O APICs 64-bit: Setting APIC routing to physical flat This is the reason for the observed interrupt routing behavior. I guess future kernels will use 'physical flat' mode to avoid hardware bugs on various complex configurations and also make it scale up to larger systems. Also utilizing hardware routing to send interrupts to many CPUs may not provide desired behavior. irqbalance application will do the needful. I agree with Arjan on the fact that equal interrupt distribution need not provide best performance. You will run into complex routing issues only when you have more NICs than actual number of CPU cores. --Vaidy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IRQ handling difference between i386 and x86_64
Arjan van de Ven wrote: On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote: Hello, It seems that IRQ handling is somehow different between i386 and x86_64. In my Dell PowerEdge 1950 is it possible to enable interrupts spreading over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 Xeon) so I think that interrupts migration may be useful. Unfortunately, it works only with 32-bit kernel. Booting it with x86_64 leads to situation, when all interrupts goes only to the first cpu matching a smp_affinity mask. arguably that is the most efficient behavior... round robin of interrupts is the worst possible case in terms of performance are you using irqbalance ? (www.irqbalance.org) If you have not been using irqbalance, then setting more bits in irq/smp_affinity will utilize hardware APIC routing. Sending interrupt to more than one CPU will work in flat mode only. In 32 bit kernel the IOAPIC is configured in flat mode while in 64 bit it is in physical flat mode. Physical flat mode will support interrupt routing to only one CPU. 32-bit: Enabling APIC mode: Flat. Using 2 I/O APICs 64-bit: Setting APIC routing to physical flat This is the reason for the observed interrupt routing behavior. I guess future kernels will use 'physical flat' mode to avoid hardware bugs on various complex configurations and also make it scale up to larger systems. Also utilizing hardware routing to send interrupts to many CPUs may not provide desired behavior. irqbalance application will do the needful. I agree with Arjan on the fact that equal interrupt distribution need not provide best performance. You will run into complex routing issues only when you have more NICs than actual number of CPU cores. --Vaidy - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IRQ handling difference between i386 and x86_64
Chris Snook [EMAIL PROTECTED] writes: Krzysztof Oledzki wrote: On Sat, 30 Jun 2007, Arjan van de Ven wrote: On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote: Hello, It seems that IRQ handling is somehow different between i386 and x86_64. In my Dell PowerEdge 1950 is it possible to enable interrupts spreading over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 Xeon) so I think that interrupts migration may be useful. Unfortunately, it works only with 32-bit kernel. Booting it with x86_64 leads to situation, when all interrupts goes only to the first cpu matching a smp_affinity mask. arguably that is the most efficient behavior... round robin of interrupts is the worst possible case in terms of performance Even on dual/quadro core CPUs with shared cache? So why it is possible to enable such behaviuor in BIOS, which works only on i386 BTW. :( are you using irqbalance ? (www.irqbalance.org) Yes, I'm aware about this useful tool, but in some situations (routing for example) it cannot help much as it keeps three cpus idle. :( Best regards, Krzysztof Olędzki Interleaving interrupt delivery will completely break TCP header prediction, and cost you far more CPU time than it will save. In fact, because of the locking, it will probably scale negatively with the number of CPUs, if your workload is mostly TCP/IP processing. The way around this is to ensure that the packets for any given TCP socket are all delivered to the same processor. If you have multiple NICs and use 802.3ad bonding with layer3+4 hashing, header prediction will work fine, and you don't have to disable irqbalance, because it will do the right thing. Regardless mostly this appears to be a case of running in a different ioapic mode, or possibly the software irq balance logic (as arch/x86_64) does not have irqbalance in the kernel. Hardware does automatic balancing in lowest-priority logical delivery mode. All balancing must be done in software in flat physical delivery mode. I'm guessing do to compile options etc. You are using flat physical delivery mode on x86_64. If the BIOS has an option to mess with this you are probably playing with as system that has known ioapic related hardware bugs and enabling it in the BIOS likely is enabling hardware problems. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IRQ handling difference between i386 and x86_64
Krzysztof Oledzki wrote: On Sat, 30 Jun 2007, Arjan van de Ven wrote: On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote: Hello, It seems that IRQ handling is somehow different between i386 and x86_64. In my Dell PowerEdge 1950 is it possible to enable interrupts spreading over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 Xeon) so I think that interrupts migration may be useful. Unfortunately, it works only with 32-bit kernel. Booting it with x86_64 leads to situation, when all interrupts goes only to the first cpu matching a smp_affinity mask. arguably that is the most efficient behavior... round robin of interrupts is the worst possible case in terms of performance Even on dual/quadro core CPUs with shared cache? So why it is possible to enable such behaviuor in BIOS, which works only on i386 BTW. :( are you using irqbalance ? (www.irqbalance.org) Yes, I'm aware about this useful tool, but in some situations (routing for example) it cannot help much as it keeps three cpus idle. :( Best regards, Krzysztof Olędzki Interleaving interrupt delivery will completely break TCP header prediction, and cost you far more CPU time than it will save. In fact, because of the locking, it will probably scale negatively with the number of CPUs, if your workload is mostly TCP/IP processing. The way around this is to ensure that the packets for any given TCP socket are all delivered to the same processor. If you have multiple NICs and use 802.3ad bonding with layer3+4 hashing, header prediction will work fine, and you don't have to disable irqbalance, because it will do the right thing. -- Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IRQ handling difference between i386 and x86_64
Krzysztof Oledzki wrote: On Sat, 30 Jun 2007, Arjan van de Ven wrote: On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote: Hello, It seems that IRQ handling is somehow different between i386 and x86_64. In my Dell PowerEdge 1950 is it possible to enable interrupts spreading over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 Xeon) so I think that interrupts migration may be useful. Unfortunately, it works only with 32-bit kernel. Booting it with x86_64 leads to situation, when all interrupts goes only to the first cpu matching a smp_affinity mask. arguably that is the most efficient behavior... round robin of interrupts is the worst possible case in terms of performance Even on dual/quadro core CPUs with shared cache? So why it is possible to enable such behaviuor in BIOS, which works only on i386 BTW. :( are you using irqbalance ? (www.irqbalance.org) Yes, I'm aware about this useful tool, but in some situations (routing for example) it cannot help much as it keeps three cpus idle. :( Best regards, Krzysztof Olędzki Interleaving interrupt delivery will completely break TCP header prediction, and cost you far more CPU time than it will save. In fact, because of the locking, it will probably scale negatively with the number of CPUs, if your workload is mostly TCP/IP processing. The way around this is to ensure that the packets for any given TCP socket are all delivered to the same processor. If you have multiple NICs and use 802.3ad bonding with layer3+4 hashing, header prediction will work fine, and you don't have to disable irqbalance, because it will do the right thing. -- Chris - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IRQ handling difference between i386 and x86_64
On Sat, 30 Jun 2007, Arjan van de Ven wrote: On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote: Hello, It seems that IRQ handling is somehow different between i386 and x86_64. In my Dell PowerEdge 1950 is it possible to enable interrupts spreading over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 Xeon) so I think that interrupts migration may be useful. Unfortunately, it works only with 32-bit kernel. Booting it with x86_64 leads to situation, when all interrupts goes only to the first cpu matching a smp_affinity mask. arguably that is the most efficient behavior... round robin of interrupts is the worst possible case in terms of performance Even on dual/quadro core CPUs with shared cache? So why it is possible to enable such behaviuor in BIOS, which works only on i386 BTW. :( are you using irqbalance ? (www.irqbalance.org) Yes, I'm aware about this useful tool, but in some situations (routing for example) it cannot help much as it keeps three cpus idle. :( Best regards, Krzysztof Olędzki
Re: IRQ handling difference between i386 and x86_64
On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote: > Hello, > > It seems that IRQ handling is somehow different between i386 and x86_64. > > In my Dell PowerEdge 1950 is it possible to enable interrupts spreading > over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 Xeon) > so I think that interrupts migration may be useful. Unfortunately, it > works only with 32-bit kernel. Booting it with x86_64 leads to situation, > when all interrupts goes only to the first cpu matching a smp_affinity > mask. arguably that is the most efficient behavior... round robin of interrupts is the worst possible case in terms of performance are you using irqbalance ? (www.irqbalance.org) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IRQ handling difference between i386 and x86_64
On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote: Hello, It seems that IRQ handling is somehow different between i386 and x86_64. In my Dell PowerEdge 1950 is it possible to enable interrupts spreading over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 Xeon) so I think that interrupts migration may be useful. Unfortunately, it works only with 32-bit kernel. Booting it with x86_64 leads to situation, when all interrupts goes only to the first cpu matching a smp_affinity mask. arguably that is the most efficient behavior... round robin of interrupts is the worst possible case in terms of performance are you using irqbalance ? (www.irqbalance.org) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IRQ handling difference between i386 and x86_64
On Sat, 30 Jun 2007, Arjan van de Ven wrote: On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote: Hello, It seems that IRQ handling is somehow different between i386 and x86_64. In my Dell PowerEdge 1950 is it possible to enable interrupts spreading over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 Xeon) so I think that interrupts migration may be useful. Unfortunately, it works only with 32-bit kernel. Booting it with x86_64 leads to situation, when all interrupts goes only to the first cpu matching a smp_affinity mask. arguably that is the most efficient behavior... round robin of interrupts is the worst possible case in terms of performance Even on dual/quadro core CPUs with shared cache? So why it is possible to enable such behaviuor in BIOS, which works only on i386 BTW. :( are you using irqbalance ? (www.irqbalance.org) Yes, I'm aware about this useful tool, but in some situations (routing for example) it cannot help much as it keeps three cpus idle. :( Best regards, Krzysztof Olędzki