Re: IRQ handling difference between i386 and x86_64

2007-07-02 Thread Eric W. Biederman
Chris Snook <[EMAIL PROTECTED]> writes:

> Krzysztof Oledzki wrote:
>>
>>
>> On Sat, 30 Jun 2007, Arjan van de Ven wrote:
>>
>>> On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote:
 Hello,

 It seems that IRQ handling is somehow different between i386 and x86_64.

 In my Dell PowerEdge 1950 is it possible to enable interrupts spreading
 over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 Xeon)
 so I think that interrupts migration may be useful. Unfortunately, it
 works only with 32-bit kernel. Booting it with x86_64 leads to situation,
 when all interrupts goes only to the first cpu matching a smp_affinity
 mask.
>>>
>>> arguably that is the most efficient behavior... round robin of
>>> interrupts is the worst possible case in terms of performance
>>
>> Even on dual/quadro core CPUs with shared cache? So why it is possible to
>> enable such behaviuor in BIOS, which works only on i386 BTW. :(
>>
>>> are you using irqbalance ? (www.irqbalance.org)
>>
>> Yes, I'm aware about this useful tool, but in some situations (routing for
>> example) it cannot help much as it keeps three cpus idle. :(
>>
>> Best regards,
>>
>> Krzysztof Olędzki
>
> Interleaving interrupt delivery will completely break TCP header prediction, 
> and
> cost you far more CPU time than it will save.  In fact, because of the 
> locking,
> it will probably scale negatively with the number of CPUs, if your workload is
> mostly TCP/IP processing.  The way around this is to ensure that the packets 
> for
> any given TCP socket are all delivered to the same processor.  If you have
> multiple NICs and use 802.3ad bonding with layer3+4 hashing, header prediction
> will work fine, and you don't have to disable irqbalance, because it will do 
> the
> right thing.

Regardless mostly this appears to be a case of running in a different
ioapic mode, or possibly the software irq balance logic (as arch/x86_64)
does not have irqbalance in the kernel.

Hardware does automatic balancing in lowest-priority logical delivery mode.
All balancing must be done in software in flat physical delivery mode.

I'm guessing do to compile options etc.  You are using flat physical delivery
mode on x86_64.

If the BIOS has an option to mess with this you are probably playing with
as system that has known ioapic related hardware bugs and enabling it in
the BIOS likely is enabling hardware problems.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IRQ handling difference between i386 and x86_64

2007-07-02 Thread Vaidyanathan Srinivasan

Arjan van de Ven wrote:
> On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote:
>> Hello,
>>
>> It seems that IRQ handling is somehow different between i386 and x86_64.
>>
>> In my Dell PowerEdge 1950 is it possible to enable interrupts spreading 
>> over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 Xeon) 
>> so I think that interrupts migration may be useful. Unfortunately, it 
>> works only with 32-bit kernel. Booting it with x86_64 leads to situation, 
>> when all interrupts goes only to the first cpu matching a smp_affinity 
>> mask.
> 
> arguably that is the most efficient behavior... round robin of
> interrupts is the worst possible case in terms of performance
> 
> are you using irqbalance ? (www.irqbalance.org)

If you have not been using irqbalance, then setting more bits in
irq/smp_affinity will utilize hardware APIC routing. Sending interrupt
to more than one CPU will work in flat mode only.

In 32 bit kernel the IOAPIC is configured in flat mode while in 64 bit
it is in physical flat mode.  Physical flat mode will support
interrupt routing to only one CPU.

32-bit: Enabling APIC mode:  Flat.  Using 2 I/O APICs
64-bit: Setting APIC routing to physical flat

This is the reason for the observed interrupt routing behavior.  I
guess future kernels will use 'physical flat' mode to avoid hardware
bugs on various complex configurations and also make it scale up to
larger systems.  Also utilizing hardware routing to send interrupts to
many CPUs may not provide desired behavior.

irqbalance application will do the needful.  I agree with Arjan on the
fact that equal interrupt distribution need not provide best
performance.  You will run into complex routing issues only when you
have more NICs than actual number of CPU cores.

--Vaidy


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IRQ handling difference between i386 and x86_64

2007-07-02 Thread Vaidyanathan Srinivasan

Arjan van de Ven wrote:
 On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote:
 Hello,

 It seems that IRQ handling is somehow different between i386 and x86_64.

 In my Dell PowerEdge 1950 is it possible to enable interrupts spreading 
 over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 Xeon) 
 so I think that interrupts migration may be useful. Unfortunately, it 
 works only with 32-bit kernel. Booting it with x86_64 leads to situation, 
 when all interrupts goes only to the first cpu matching a smp_affinity 
 mask.
 
 arguably that is the most efficient behavior... round robin of
 interrupts is the worst possible case in terms of performance
 
 are you using irqbalance ? (www.irqbalance.org)

If you have not been using irqbalance, then setting more bits in
irq/smp_affinity will utilize hardware APIC routing. Sending interrupt
to more than one CPU will work in flat mode only.

In 32 bit kernel the IOAPIC is configured in flat mode while in 64 bit
it is in physical flat mode.  Physical flat mode will support
interrupt routing to only one CPU.

32-bit: Enabling APIC mode:  Flat.  Using 2 I/O APICs
64-bit: Setting APIC routing to physical flat

This is the reason for the observed interrupt routing behavior.  I
guess future kernels will use 'physical flat' mode to avoid hardware
bugs on various complex configurations and also make it scale up to
larger systems.  Also utilizing hardware routing to send interrupts to
many CPUs may not provide desired behavior.

irqbalance application will do the needful.  I agree with Arjan on the
fact that equal interrupt distribution need not provide best
performance.  You will run into complex routing issues only when you
have more NICs than actual number of CPU cores.

--Vaidy


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IRQ handling difference between i386 and x86_64

2007-07-02 Thread Eric W. Biederman
Chris Snook [EMAIL PROTECTED] writes:

 Krzysztof Oledzki wrote:


 On Sat, 30 Jun 2007, Arjan van de Ven wrote:

 On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote:
 Hello,

 It seems that IRQ handling is somehow different between i386 and x86_64.

 In my Dell PowerEdge 1950 is it possible to enable interrupts spreading
 over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 Xeon)
 so I think that interrupts migration may be useful. Unfortunately, it
 works only with 32-bit kernel. Booting it with x86_64 leads to situation,
 when all interrupts goes only to the first cpu matching a smp_affinity
 mask.

 arguably that is the most efficient behavior... round robin of
 interrupts is the worst possible case in terms of performance

 Even on dual/quadro core CPUs with shared cache? So why it is possible to
 enable such behaviuor in BIOS, which works only on i386 BTW. :(

 are you using irqbalance ? (www.irqbalance.org)

 Yes, I'm aware about this useful tool, but in some situations (routing for
 example) it cannot help much as it keeps three cpus idle. :(

 Best regards,

 Krzysztof Olędzki

 Interleaving interrupt delivery will completely break TCP header prediction, 
 and
 cost you far more CPU time than it will save.  In fact, because of the 
 locking,
 it will probably scale negatively with the number of CPUs, if your workload is
 mostly TCP/IP processing.  The way around this is to ensure that the packets 
 for
 any given TCP socket are all delivered to the same processor.  If you have
 multiple NICs and use 802.3ad bonding with layer3+4 hashing, header prediction
 will work fine, and you don't have to disable irqbalance, because it will do 
 the
 right thing.

Regardless mostly this appears to be a case of running in a different
ioapic mode, or possibly the software irq balance logic (as arch/x86_64)
does not have irqbalance in the kernel.

Hardware does automatic balancing in lowest-priority logical delivery mode.
All balancing must be done in software in flat physical delivery mode.

I'm guessing do to compile options etc.  You are using flat physical delivery
mode on x86_64.

If the BIOS has an option to mess with this you are probably playing with
as system that has known ioapic related hardware bugs and enabling it in
the BIOS likely is enabling hardware problems.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IRQ handling difference between i386 and x86_64

2007-07-01 Thread Chris Snook

Krzysztof Oledzki wrote:



On Sat, 30 Jun 2007, Arjan van de Ven wrote:


On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote:

Hello,

It seems that IRQ handling is somehow different between i386 and x86_64.

In my Dell PowerEdge 1950 is it possible to enable interrupts spreading
over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 
Xeon)

so I think that interrupts migration may be useful. Unfortunately, it
works only with 32-bit kernel. Booting it with x86_64 leads to 
situation,

when all interrupts goes only to the first cpu matching a smp_affinity
mask.


arguably that is the most efficient behavior... round robin of
interrupts is the worst possible case in terms of performance


Even on dual/quadro core CPUs with shared cache? So why it is possible 
to enable such behaviuor in BIOS, which works only on i386 BTW. :(



are you using irqbalance ? (www.irqbalance.org)


Yes, I'm aware about this useful tool, but in some situations (routing 
for example) it cannot help much as it keeps three cpus idle. :(


Best regards,

Krzysztof Olędzki


Interleaving interrupt delivery will completely break TCP header 
prediction, and cost you far more CPU time than it will save.  In fact, 
because of the locking, it will probably scale negatively with the 
number of CPUs, if your workload is mostly TCP/IP processing.  The way 
around this is to ensure that the packets for any given TCP socket are 
all delivered to the same processor.  If you have multiple NICs and use 
802.3ad bonding with layer3+4 hashing, header prediction will work fine, 
and you don't have to disable irqbalance, because it will do the right 
thing.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IRQ handling difference between i386 and x86_64

2007-07-01 Thread Chris Snook

Krzysztof Oledzki wrote:



On Sat, 30 Jun 2007, Arjan van de Ven wrote:


On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote:

Hello,

It seems that IRQ handling is somehow different between i386 and x86_64.

In my Dell PowerEdge 1950 is it possible to enable interrupts spreading
over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 
Xeon)

so I think that interrupts migration may be useful. Unfortunately, it
works only with 32-bit kernel. Booting it with x86_64 leads to 
situation,

when all interrupts goes only to the first cpu matching a smp_affinity
mask.


arguably that is the most efficient behavior... round robin of
interrupts is the worst possible case in terms of performance


Even on dual/quadro core CPUs with shared cache? So why it is possible 
to enable such behaviuor in BIOS, which works only on i386 BTW. :(



are you using irqbalance ? (www.irqbalance.org)


Yes, I'm aware about this useful tool, but in some situations (routing 
for example) it cannot help much as it keeps three cpus idle. :(


Best regards,

Krzysztof Olędzki


Interleaving interrupt delivery will completely break TCP header 
prediction, and cost you far more CPU time than it will save.  In fact, 
because of the locking, it will probably scale negatively with the 
number of CPUs, if your workload is mostly TCP/IP processing.  The way 
around this is to ensure that the packets for any given TCP socket are 
all delivered to the same processor.  If you have multiple NICs and use 
802.3ad bonding with layer3+4 hashing, header prediction will work fine, 
and you don't have to disable irqbalance, because it will do the right 
thing.


-- Chris
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IRQ handling difference between i386 and x86_64

2007-06-30 Thread Krzysztof Oledzki



On Sat, 30 Jun 2007, Arjan van de Ven wrote:


On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote:

Hello,

It seems that IRQ handling is somehow different between i386 and x86_64.

In my Dell PowerEdge 1950 is it possible to enable interrupts spreading
over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 Xeon)
so I think that interrupts migration may be useful. Unfortunately, it
works only with 32-bit kernel. Booting it with x86_64 leads to situation,
when all interrupts goes only to the first cpu matching a smp_affinity
mask.


arguably that is the most efficient behavior... round robin of
interrupts is the worst possible case in terms of performance


Even on dual/quadro core CPUs with shared cache? So why it is possible to 
enable such behaviuor in BIOS, which works only on i386 BTW. :(



are you using irqbalance ? (www.irqbalance.org)


Yes, I'm aware about this useful tool, but in some situations (routing 
for example) it cannot help much as it keeps three cpus idle. :(


Best regards,

Krzysztof Olędzki

Re: IRQ handling difference between i386 and x86_64

2007-06-30 Thread Arjan van de Ven
On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote:
> Hello,
> 
> It seems that IRQ handling is somehow different between i386 and x86_64.
> 
> In my Dell PowerEdge 1950 is it possible to enable interrupts spreading 
> over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 Xeon) 
> so I think that interrupts migration may be useful. Unfortunately, it 
> works only with 32-bit kernel. Booting it with x86_64 leads to situation, 
> when all interrupts goes only to the first cpu matching a smp_affinity 
> mask.

arguably that is the most efficient behavior... round robin of
interrupts is the worst possible case in terms of performance

are you using irqbalance ? (www.irqbalance.org)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IRQ handling difference between i386 and x86_64

2007-06-30 Thread Arjan van de Ven
On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote:
 Hello,
 
 It seems that IRQ handling is somehow different between i386 and x86_64.
 
 In my Dell PowerEdge 1950 is it possible to enable interrupts spreading 
 over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 Xeon) 
 so I think that interrupts migration may be useful. Unfortunately, it 
 works only with 32-bit kernel. Booting it with x86_64 leads to situation, 
 when all interrupts goes only to the first cpu matching a smp_affinity 
 mask.

arguably that is the most efficient behavior... round robin of
interrupts is the worst possible case in terms of performance

are you using irqbalance ? (www.irqbalance.org)


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IRQ handling difference between i386 and x86_64

2007-06-30 Thread Krzysztof Oledzki



On Sat, 30 Jun 2007, Arjan van de Ven wrote:


On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote:

Hello,

It seems that IRQ handling is somehow different between i386 and x86_64.

In my Dell PowerEdge 1950 is it possible to enable interrupts spreading
over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 Xeon)
so I think that interrupts migration may be useful. Unfortunately, it
works only with 32-bit kernel. Booting it with x86_64 leads to situation,
when all interrupts goes only to the first cpu matching a smp_affinity
mask.


arguably that is the most efficient behavior... round robin of
interrupts is the worst possible case in terms of performance


Even on dual/quadro core CPUs with shared cache? So why it is possible to 
enable such behaviuor in BIOS, which works only on i386 BTW. :(



are you using irqbalance ? (www.irqbalance.org)


Yes, I'm aware about this useful tool, but in some situations (routing 
for example) it cannot help much as it keeps three cpus idle. :(


Best regards,

Krzysztof Olędzki