Re: [Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC

2014-05-28 Thread Thomas Gleixner
On Wed, 28 May 2014, Thomas Gleixner wrote:
> On Wed, 28 May 2014, Jiang Liu wrote:
> > This is used to work around special non-ISA interrupts with GSI below
> > NR_IRQS_LEGACY. The original code for the special case is:
> > /*
> >  * Provide an identity mapping of gsi == irq except on truly
> >  * weird platforms that have non isa irqs in the first 16 gsis.
> >  */
> > return gsi >= NR_IRQS_LEGACY ? gsi : gsi_top + gsi;
> 
> That looks really, really wrong. What's wrong with assigning that irq
> irq number on those platforms?
> 
> The weird stuff is SFI and devicetree, if I understand your code
> correctly.
> 
> So if those platforms do not have actual legacy irqs, what's wrong
> with giving out the legacy numbers?

Though if it's a real issue, we can simply mark the ioapic on those
devices as DYNAMIC and avoid that whole magic.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC

2014-05-28 Thread Thomas Gleixner
On Wed, 28 May 2014, Jiang Liu wrote:
> On 2014/5/28 3:58, Thomas Gleixner wrote:
> > So you have these cases covered here:
> > 
> > 1) The ACPI case of secondary ioapics. You only have the strict 1:1
> >mapping for the first ioapic
> > 
> > 2) The gsi < NR_IRQS_LEGACY case where you have two options:
> > 
> > a) Let the core create a random virq number if ioapic_identity_map
> >is 0
> > 
> >ioapic_identity_map is only set by SFI and devicetree
> > 
> >So in all other cases we fall into that code path for all
> >legacy interrupts. So how is that supposed to work lets say for
> >i8042 which has hardcoded irq 1 and 12?
> > 
> >irq_create_mapping(1)
> >
> > hint = 1 % nr_irqs; --> 1
> > virq = irq_alloc_desc_from(hint, of_node_to_nid(domain->of_node));
> > 
> > This returns something >= 16, because the irq descriptors
> > for 0-15 (LEGACY) are allocated already.
> > 
> >The pin association works, but how is the i8042 driver supposed
> >to figure out that it should request the virq >=16 which was
> >created instead of the hardcoded 1 ?
> This is used to work around special non-ISA interrupts with GSI below
> NR_IRQS_LEGACY. The original code for the special case is:
> /*
>  * Provide an identity mapping of gsi == irq except on truly
>  * weird platforms that have non isa irqs in the first 16 gsis.
>  */
> return gsi >= NR_IRQS_LEGACY ? gsi : gsi_top + gsi;

That looks really, really wrong. What's wrong with assigning that irq
irq number on those platforms?

The weird stuff is SFI and devicetree, if I understand your code
correctly.

So if those platforms do not have actual legacy irqs, what's wrong
with giving out the legacy numbers?

> We have one path to handle ISA IRQs before calling
> alloc_irq_from_domain() as below:
> if (idx >= 0 && test_bit(mp_irqs[idx].srcbus, mp_bus_not_pci))
> return mp_irqs[idx].srcbusirq;

Ok.
 
> > /* We can't set this earlier, because we need to calibrate the 
> > timer */
> > legacy_pic = _legacy_pic;
> I haven't figured out the story behind the comment yet:(

Sebastian gave some insight.
 
> > Why do we need strict mappings in the non ACPI case for all ioapic
> > pins? What's so different about ACPI? Or is this just to avoid
> > breaking the existing SFI/devicetree stuff. If that's the reason I'm
> > fine with it, but ...
> It's to avoid breaking SFI/intel_mid stuff. intel_mid assumes IRQ
> number equals to pin number and use pci_dev->irq to save both IRQ
> number and pin number.

Fair enough.

Thanks,

tglx

 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC

2014-05-28 Thread Sebastian Andrzej Siewior
On 05/28/2014 12:07 PM, Thomas Gleixner wrote:
> 
> Right, so it needs the setup of irq 0 and that only happens when the
> legacy_pic->nr_legacy_irqs > 0.
>
> Do you remember, why we switch to the null_pic later on?

According to my memory all interrupts are serviced by the IOAPIC. The
first 16 by the first IOAPIC, 17+ (everything behind the "PCI-bus") by
the second IOAPIC. For that reason I didn't see a reason for the legacy
PIC and set it to null_pic. Due to the timer problem it is delayed.

> Thanks,
> 
>   tglx

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC

2014-05-28 Thread Thomas Gleixner
On Wed, 28 May 2014, Sebastian Andrzej Siewior wrote:
> On 05/27/2014 09:58 PM, Thomas Gleixner wrote:
> > ce4100 is an oddball though. The ioapic is registered way before the
> > interrupt subsystem is initialized and I have a hard time to
> > understand that comment:
> > 
> > /* We can't set this earlier, because we need to calibrate the 
> > timer */
> > legacy_pic = _legacy_pic;
> > 
> > The timer calibration happens after the interrupts are set up. I
> > assume it's check_timer() which wants that, but we know exactly how
> > the ce4100 works, so we might be able to avoid that whole "testing"
> > stuff. Sebastian, any input on this?
> 
> According to my memory there was some PIC init we need but I don't
> recall the details. However, booting the system with "legacy_pic =
> _legacy_pic;" in x86_ce4100_early_setup() gives me this:
> 
> [0.001000] Enabling APIC mode:  Flat.  Using 2 I/O APICs
> [0.001000] leaving PIC mode, enabling APIC mode.
> [0.001000] enabled ExtINT on CPU#0
> [0.001000] ENABLING IO-APIC IRQs
> [0.001000] Setting 1 in the phys_id_present_map
> [0.001000] ...changing IO-APIC physical APIC ID to 1 ... ok.
> [0.001000] Setting 2 in the phys_id_present_map
> [0.001000] ...changing IO-APIC physical APIC ID to 2 ... ok.
> [0.001000] smpboot: CPU0: Intel(R) Atom(TM) CPU CE4150   @ 1.20GHz
> (fam: 06, model: 1c, stepping: 0a)
> [0.001000] Using local APIC timer interrupts.
> [0.001000] calibrating APIC timer ...
> 
> and we stand still. With the assignment as it is now:
> 
> [0.001000] Enabling APIC mode:  Flat.  Using 2 I/O APICs
> [0.001000] leaving PIC mode, enabling APIC mode.
> [0.001000] enabled ExtINT on CPU#0
> [0.002312] ENABLING IO-APIC IRQs
> [0.003009] Setting 1 in the phys_id_present_map
> [0.004007] ...changing IO-APIC physical APIC ID to 1 ... ok.
> [0.005533] Setting 2 in the phys_id_present_map
> [0.006006] ...changing IO-APIC physical APIC ID to 2 ... ok.
> [0.008373] ..TIMER: vector=0x30 apic1=-1 pin1=-1 apic2=-1 pin2=-1
> [0.009000] ...trying to set up timer as Virtual Wire IRQ...
> [0.019490] . works.

Right, so it needs the setup of irq 0 and that only happens when the
legacy_pic->nr_legacy_irqs > 0.

Do you remember, why we switch to the null_pic later on?

Thanks,

tglx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC

2014-05-28 Thread Sebastian Andrzej Siewior
On 05/27/2014 09:58 PM, Thomas Gleixner wrote:
> ce4100 is an oddball though. The ioapic is registered way before the
> interrupt subsystem is initialized and I have a hard time to
> understand that comment:
> 
> /* We can't set this earlier, because we need to calibrate the timer 
> */
> legacy_pic = _legacy_pic;
> 
> The timer calibration happens after the interrupts are set up. I
> assume it's check_timer() which wants that, but we know exactly how
> the ce4100 works, so we might be able to avoid that whole "testing"
> stuff. Sebastian, any input on this?

According to my memory there was some PIC init we need but I don't
recall the details. However, booting the system with "legacy_pic =
_legacy_pic;" in x86_ce4100_early_setup() gives me this:

[0.001000] Enabling APIC mode:  Flat.  Using 2 I/O APICs
[0.001000] leaving PIC mode, enabling APIC mode.
[0.001000] enabled ExtINT on CPU#0
[0.001000] ENABLING IO-APIC IRQs
[0.001000] Setting 1 in the phys_id_present_map
[0.001000] ...changing IO-APIC physical APIC ID to 1 ... ok.
[0.001000] Setting 2 in the phys_id_present_map
[0.001000] ...changing IO-APIC physical APIC ID to 2 ... ok.
[0.001000] smpboot: CPU0: Intel(R) Atom(TM) CPU CE4150   @ 1.20GHz
(fam: 06, model: 1c, stepping: 0a)
[0.001000] Using local APIC timer interrupts.
[0.001000] calibrating APIC timer ...

and we stand still. With the assignment as it is now:

[0.001000] Enabling APIC mode:  Flat.  Using 2 I/O APICs
[0.001000] leaving PIC mode, enabling APIC mode.
[0.001000] enabled ExtINT on CPU#0
[0.002312] ENABLING IO-APIC IRQs
[0.003009] Setting 1 in the phys_id_present_map
[0.004007] ...changing IO-APIC physical APIC ID to 1 ... ok.
[0.005533] Setting 2 in the phys_id_present_map
[0.006006] ...changing IO-APIC physical APIC ID to 2 ... ok.
[0.008373] ..TIMER: vector=0x30 apic1=-1 pin1=-1 apic2=-1 pin2=-1
[0.009000] ...trying to set up timer as Virtual Wire IRQ...
[0.019490] . works.
[0.020005] smpboot: CPU0: Intel(R) Atom(TM) CPU CE4150   @ 1.20GHz
(fam: 06, model: 1c, stepping: 0a)
[0.024005] Using local APIC timer interrupts.
[0.024005] calibrating APIC timer ...
[0.026000] ... lapic delta = 624992
[0.026000] ... PM-Timer delta = 0
[0.026000] . delta 624992
[0.026000] . mult: 26843202
[0.026000] . calibration result: 8
[0.026000] . CPU clock speed is 1199.0984 MHz.
[0.026000] . host bus clock speed is 99.0998 MHz.
[0.026000] ... verify APIC timer
[0.128565] ... jiffies delta = 100
[0.129004] ... jiffies result ok

and it goes on. As you see, "..TIMER:" is missing in the upper output.

> 
> Thanks,
> 
>   tglx

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC

2014-05-28 Thread Thomas Gleixner
On Wed, 28 May 2014, Jiang Liu wrote:
 On 2014/5/28 3:58, Thomas Gleixner wrote:
  So you have these cases covered here:
  
  1) The ACPI case of secondary ioapics. You only have the strict 1:1
 mapping for the first ioapic
  
  2) The gsi  NR_IRQS_LEGACY case where you have two options:
  
  a) Let the core create a random virq number if ioapic_identity_map
 is 0
  
 ioapic_identity_map is only set by SFI and devicetree
  
 So in all other cases we fall into that code path for all
 legacy interrupts. So how is that supposed to work lets say for
 i8042 which has hardcoded irq 1 and 12?
  
 irq_create_mapping(1)
 
  hint = 1 % nr_irqs; -- 1
  virq = irq_alloc_desc_from(hint, of_node_to_nid(domain-of_node));
  
  This returns something = 16, because the irq descriptors
  for 0-15 (LEGACY) are allocated already.
  
 The pin association works, but how is the i8042 driver supposed
 to figure out that it should request the virq =16 which was
 created instead of the hardcoded 1 ?
 This is used to work around special non-ISA interrupts with GSI below
 NR_IRQS_LEGACY. The original code for the special case is:
 /*
  * Provide an identity mapping of gsi == irq except on truly
  * weird platforms that have non isa irqs in the first 16 gsis.
  */
 return gsi = NR_IRQS_LEGACY ? gsi : gsi_top + gsi;

That looks really, really wrong. What's wrong with assigning that irq
irq number on those platforms?

The weird stuff is SFI and devicetree, if I understand your code
correctly.

So if those platforms do not have actual legacy irqs, what's wrong
with giving out the legacy numbers?

 We have one path to handle ISA IRQs before calling
 alloc_irq_from_domain() as below:
 if (idx = 0  test_bit(mp_irqs[idx].srcbus, mp_bus_not_pci))
 return mp_irqs[idx].srcbusirq;

Ok.
 
  /* We can't set this earlier, because we need to calibrate the 
  timer */
  legacy_pic = null_legacy_pic;
 I haven't figured out the story behind the comment yet:(

Sebastian gave some insight.
 
  Why do we need strict mappings in the non ACPI case for all ioapic
  pins? What's so different about ACPI? Or is this just to avoid
  breaking the existing SFI/devicetree stuff. If that's the reason I'm
  fine with it, but ...
 It's to avoid breaking SFI/intel_mid stuff. intel_mid assumes IRQ
 number equals to pin number and use pci_dev-irq to save both IRQ
 number and pin number.

Fair enough.

Thanks,

tglx

 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC

2014-05-28 Thread Thomas Gleixner
On Wed, 28 May 2014, Thomas Gleixner wrote:
 On Wed, 28 May 2014, Jiang Liu wrote:
  This is used to work around special non-ISA interrupts with GSI below
  NR_IRQS_LEGACY. The original code for the special case is:
  /*
   * Provide an identity mapping of gsi == irq except on truly
   * weird platforms that have non isa irqs in the first 16 gsis.
   */
  return gsi = NR_IRQS_LEGACY ? gsi : gsi_top + gsi;
 
 That looks really, really wrong. What's wrong with assigning that irq
 irq number on those platforms?
 
 The weird stuff is SFI and devicetree, if I understand your code
 correctly.
 
 So if those platforms do not have actual legacy irqs, what's wrong
 with giving out the legacy numbers?

Though if it's a real issue, we can simply mark the ioapic on those
devices as DYNAMIC and avoid that whole magic.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC

2014-05-28 Thread Sebastian Andrzej Siewior
On 05/27/2014 09:58 PM, Thomas Gleixner wrote:
 ce4100 is an oddball though. The ioapic is registered way before the
 interrupt subsystem is initialized and I have a hard time to
 understand that comment:
 
 /* We can't set this earlier, because we need to calibrate the timer 
 */
 legacy_pic = null_legacy_pic;
 
 The timer calibration happens after the interrupts are set up. I
 assume it's check_timer() which wants that, but we know exactly how
 the ce4100 works, so we might be able to avoid that whole testing
 stuff. Sebastian, any input on this?

According to my memory there was some PIC init we need but I don't
recall the details. However, booting the system with legacy_pic =
null_legacy_pic; in x86_ce4100_early_setup() gives me this:

[0.001000] Enabling APIC mode:  Flat.  Using 2 I/O APICs
[0.001000] leaving PIC mode, enabling APIC mode.
[0.001000] enabled ExtINT on CPU#0
[0.001000] ENABLING IO-APIC IRQs
[0.001000] Setting 1 in the phys_id_present_map
[0.001000] ...changing IO-APIC physical APIC ID to 1 ... ok.
[0.001000] Setting 2 in the phys_id_present_map
[0.001000] ...changing IO-APIC physical APIC ID to 2 ... ok.
[0.001000] smpboot: CPU0: Intel(R) Atom(TM) CPU CE4150   @ 1.20GHz
(fam: 06, model: 1c, stepping: 0a)
[0.001000] Using local APIC timer interrupts.
[0.001000] calibrating APIC timer ...

and we stand still. With the assignment as it is now:

[0.001000] Enabling APIC mode:  Flat.  Using 2 I/O APICs
[0.001000] leaving PIC mode, enabling APIC mode.
[0.001000] enabled ExtINT on CPU#0
[0.002312] ENABLING IO-APIC IRQs
[0.003009] Setting 1 in the phys_id_present_map
[0.004007] ...changing IO-APIC physical APIC ID to 1 ... ok.
[0.005533] Setting 2 in the phys_id_present_map
[0.006006] ...changing IO-APIC physical APIC ID to 2 ... ok.
[0.008373] ..TIMER: vector=0x30 apic1=-1 pin1=-1 apic2=-1 pin2=-1
[0.009000] ...trying to set up timer as Virtual Wire IRQ...
[0.019490] . works.
[0.020005] smpboot: CPU0: Intel(R) Atom(TM) CPU CE4150   @ 1.20GHz
(fam: 06, model: 1c, stepping: 0a)
[0.024005] Using local APIC timer interrupts.
[0.024005] calibrating APIC timer ...
[0.026000] ... lapic delta = 624992
[0.026000] ... PM-Timer delta = 0
[0.026000] . delta 624992
[0.026000] . mult: 26843202
[0.026000] . calibration result: 8
[0.026000] . CPU clock speed is 1199.0984 MHz.
[0.026000] . host bus clock speed is 99.0998 MHz.
[0.026000] ... verify APIC timer
[0.128565] ... jiffies delta = 100
[0.129004] ... jiffies result ok

and it goes on. As you see, ..TIMER: is missing in the upper output.

 
 Thanks,
 
   tglx

Sebastian
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC

2014-05-28 Thread Thomas Gleixner
On Wed, 28 May 2014, Sebastian Andrzej Siewior wrote:
 On 05/27/2014 09:58 PM, Thomas Gleixner wrote:
  ce4100 is an oddball though. The ioapic is registered way before the
  interrupt subsystem is initialized and I have a hard time to
  understand that comment:
  
  /* We can't set this earlier, because we need to calibrate the 
  timer */
  legacy_pic = null_legacy_pic;
  
  The timer calibration happens after the interrupts are set up. I
  assume it's check_timer() which wants that, but we know exactly how
  the ce4100 works, so we might be able to avoid that whole testing
  stuff. Sebastian, any input on this?
 
 According to my memory there was some PIC init we need but I don't
 recall the details. However, booting the system with legacy_pic =
 null_legacy_pic; in x86_ce4100_early_setup() gives me this:
 
 [0.001000] Enabling APIC mode:  Flat.  Using 2 I/O APICs
 [0.001000] leaving PIC mode, enabling APIC mode.
 [0.001000] enabled ExtINT on CPU#0
 [0.001000] ENABLING IO-APIC IRQs
 [0.001000] Setting 1 in the phys_id_present_map
 [0.001000] ...changing IO-APIC physical APIC ID to 1 ... ok.
 [0.001000] Setting 2 in the phys_id_present_map
 [0.001000] ...changing IO-APIC physical APIC ID to 2 ... ok.
 [0.001000] smpboot: CPU0: Intel(R) Atom(TM) CPU CE4150   @ 1.20GHz
 (fam: 06, model: 1c, stepping: 0a)
 [0.001000] Using local APIC timer interrupts.
 [0.001000] calibrating APIC timer ...
 
 and we stand still. With the assignment as it is now:
 
 [0.001000] Enabling APIC mode:  Flat.  Using 2 I/O APICs
 [0.001000] leaving PIC mode, enabling APIC mode.
 [0.001000] enabled ExtINT on CPU#0
 [0.002312] ENABLING IO-APIC IRQs
 [0.003009] Setting 1 in the phys_id_present_map
 [0.004007] ...changing IO-APIC physical APIC ID to 1 ... ok.
 [0.005533] Setting 2 in the phys_id_present_map
 [0.006006] ...changing IO-APIC physical APIC ID to 2 ... ok.
 [0.008373] ..TIMER: vector=0x30 apic1=-1 pin1=-1 apic2=-1 pin2=-1
 [0.009000] ...trying to set up timer as Virtual Wire IRQ...
 [0.019490] . works.

Right, so it needs the setup of irq 0 and that only happens when the
legacy_pic-nr_legacy_irqs  0.

Do you remember, why we switch to the null_pic later on?

Thanks,

tglx


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC

2014-05-28 Thread Sebastian Andrzej Siewior
On 05/28/2014 12:07 PM, Thomas Gleixner wrote:
 
 Right, so it needs the setup of irq 0 and that only happens when the
 legacy_pic-nr_legacy_irqs  0.

 Do you remember, why we switch to the null_pic later on?

According to my memory all interrupts are serviced by the IOAPIC. The
first 16 by the first IOAPIC, 17+ (everything behind the PCI-bus) by
the second IOAPIC. For that reason I didn't see a reason for the legacy
PIC and set it to null_pic. Due to the timer problem it is delayed.

 Thanks,
 
   tglx

Sebastian
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC

2014-05-27 Thread Jiang Liu
Hi Thomas,
Thanks for your comments. Please refer to inline
comments below.

On 2014/5/28 3:58, Thomas Gleixner wrote:
> Jiang,
> 
> On Tue, 27 May 2014, Jiang Liu wrote:
> 
>> +static int alloc_irq_from_domain(struct irq_domain *domain, u32 gsi, int 
>> pin)
>>  {
>> +int irq = -1;
>> +
>> +if (gsi >= arch_dynirq_lower_bound(0)) {
>> +irq = irq_create_mapping(domain, pin);
>> +} else if (gsi < NR_IRQS_LEGACY) {
>> +if (!ioapic_identity_map)
>> +irq = irq_create_mapping(domain, pin);
>> +else if (irq_domain_associate(domain, gsi, pin) == 0)
>> +irq = gsi;
>> +} else if (irq_create_strict_mappings(domain, gsi, pin, 1) == 0) {
>> +irq = gsi;
>> +}
> 
> So you have these cases covered here:
> 
> 1) The ACPI case of secondary ioapics. You only have the strict 1:1
>mapping for the first ioapic
> 
> 2) The gsi < NR_IRQS_LEGACY case where you have two options:
> 
> a) Let the core create a random virq number if ioapic_identity_map
>is 0
> 
>ioapic_identity_map is only set by SFI and devicetree
> 
>So in all other cases we fall into that code path for all
>legacy interrupts. So how is that supposed to work lets say for
>i8042 which has hardcoded irq 1 and 12?
> 
>irq_create_mapping(1)
>
>   hint = 1 % nr_irqs; --> 1
>   virq = irq_alloc_desc_from(hint, of_node_to_nid(domain->of_node));
> 
>   This returns something >= 16, because the irq descriptors
>   for 0-15 (LEGACY) are allocated already.
> 
>The pin association works, but how is the i8042 driver supposed
>to figure out that it should request the virq >=16 which was
>created instead of the hardcoded 1 ?
This is used to work around special non-ISA interrupts with GSI below
NR_IRQS_LEGACY. The original code for the special case is:
/*
 * Provide an identity mapping of gsi == irq except on truly
 * weird platforms that have non isa irqs in the first 16 gsis.
 */
return gsi >= NR_IRQS_LEGACY ? gsi : gsi_top + gsi;

We have one path to handle ISA IRQs before calling
alloc_irq_from_domain() as below:
if (idx >= 0 && test_bit(mp_irqs[idx].srcbus, mp_bus_not_pci))
return mp_irqs[idx].srcbusirq;

>   
> b) Associate the gsi and the pin
> 
>This only works because the virqs are already allocated at boot
>time unconditionally due to arch_probe_nr_irqs() returning
>NR_IRQS_LEGACY. So irq_domain_associate() works.
>Undocumented works by chance behaviour.
Yes. It's a good suggestion to enhance legacy_pic to make this
code more clear.

> 
> 3) The case where gsi < arch_dynirq_lower_bound()
> 
>You create a strict mapping here, fine.
> 
> This is confusing at best.
> 
> First of all, we should use legacy_pic->nr_legacy_irqs instead of
> NR_IRQS_LEGACY all over the place.
> 
> mshyperv, ce4100 and intel-mid use the null_legacy_pic which has
> nr_legacy_irqs = 0 and everything else uses the real pic which has
> nr_legacy_irqs = NR_IRQS_LEGACY. So why do we even bother to allocate
> and deal with NR_IRQS_LEGACY in the cases where we have no legacy?
I'm not sure whether it works with ce4100, so used NR_IRQS_LEGACY
instead of legacy_pic->nr_legacy_irqs for safety. Will try to refine
it in next version.

> 
> ce4100 is an oddball though. The ioapic is registered way before the
> interrupt subsystem is initialized and I have a hard time to
> understand that comment:
> 
> /* We can't set this earlier, because we need to calibrate the timer 
> */
> legacy_pic = _legacy_pic;
I haven't figured out the story behind the comment yet:(

> 
> The timer calibration happens after the interrupts are set up. I
> assume it's check_timer() which wants that, but we know exactly how
> the ce4100 works, so we might be able to avoid that whole "testing"
> stuff. Sebastian, any input on this?
> 
> If it turns out that ce4100 needs the inital real legacy pic for some
> magic reason we still can be clever by extending the legacy pic data
> structure to tell us about that change, i.e. instead of using
> legacy_pic->nr_legacy_irqs having a field "nr_allocated_irqs", which
> is set to NR_IRQS_LEGACY for the real pic and to 0 for the null_pic
> and let ce4100 set that field to NR_IRQS_LEGACY before switching the
> legacy_pic over to the null implementation.
Good suggestion, will try this way.

> But what's really disgusting is the magic ioapic_identity_map and the
> extra ACPI specific ioapic_dynirq_base hackery.
> 
> Why do we need strict mappings in the non ACPI case for all ioapic
> pins? What's so different about ACPI? Or is this just to avoid
> breaking the existing SFI/devicetree stuff. If that's the reason I'm
> fine with it, but ...
It's to avoid breaking SFI/intel_mid stuff. intel_mid assumes IRQ
number equals to pin number and use pci_dev->irq to save both IRQ
number and pin number.

Re: [Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC

2014-05-27 Thread Thomas Gleixner
Jiang,

On Tue, 27 May 2014, Jiang Liu wrote:

> +static int alloc_irq_from_domain(struct irq_domain *domain, u32 gsi, int pin)
>  {
> + int irq = -1;
> +
> + if (gsi >= arch_dynirq_lower_bound(0)) {
> + irq = irq_create_mapping(domain, pin);
> + } else if (gsi < NR_IRQS_LEGACY) {
> + if (!ioapic_identity_map)
> + irq = irq_create_mapping(domain, pin);
> + else if (irq_domain_associate(domain, gsi, pin) == 0)
> + irq = gsi;
> + } else if (irq_create_strict_mappings(domain, gsi, pin, 1) == 0) {
> + irq = gsi;
> + }

So you have these cases covered here:

1) The ACPI case of secondary ioapics. You only have the strict 1:1
   mapping for the first ioapic

2) The gsi < NR_IRQS_LEGACY case where you have two options:

a) Let the core create a random virq number if ioapic_identity_map
   is 0

   ioapic_identity_map is only set by SFI and devicetree

   So in all other cases we fall into that code path for all
   legacy interrupts. So how is that supposed to work lets say for
   i8042 which has hardcoded irq 1 and 12?

   irq_create_mapping(1)
   
hint = 1 % nr_irqs; --> 1
virq = irq_alloc_desc_from(hint, of_node_to_nid(domain->of_node));

This returns something >= 16, because the irq descriptors
for 0-15 (LEGACY) are allocated already.

   The pin association works, but how is the i8042 driver supposed
   to figure out that it should request the virq >=16 which was
   created instead of the hardcoded 1 ?

b) Associate the gsi and the pin

   This only works because the virqs are already allocated at boot
   time unconditionally due to arch_probe_nr_irqs() returning
   NR_IRQS_LEGACY. So irq_domain_associate() works.
   Undocumented works by chance behaviour.

3) The case where gsi < arch_dynirq_lower_bound()

   You create a strict mapping here, fine.

This is confusing at best.

First of all, we should use legacy_pic->nr_legacy_irqs instead of
NR_IRQS_LEGACY all over the place.

mshyperv, ce4100 and intel-mid use the null_legacy_pic which has
nr_legacy_irqs = 0 and everything else uses the real pic which has
nr_legacy_irqs = NR_IRQS_LEGACY. So why do we even bother to allocate
and deal with NR_IRQS_LEGACY in the cases where we have no legacy?

ce4100 is an oddball though. The ioapic is registered way before the
interrupt subsystem is initialized and I have a hard time to
understand that comment:

/* We can't set this earlier, because we need to calibrate the timer */
legacy_pic = _legacy_pic;

The timer calibration happens after the interrupts are set up. I
assume it's check_timer() which wants that, but we know exactly how
the ce4100 works, so we might be able to avoid that whole "testing"
stuff. Sebastian, any input on this?

If it turns out that ce4100 needs the inital real legacy pic for some
magic reason we still can be clever by extending the legacy pic data
structure to tell us about that change, i.e. instead of using
legacy_pic->nr_legacy_irqs having a field "nr_allocated_irqs", which
is set to NR_IRQS_LEGACY for the real pic and to 0 for the null_pic
and let ce4100 set that field to NR_IRQS_LEGACY before switching the
legacy_pic over to the null implementation.


But what's really disgusting is the magic ioapic_identity_map and the
extra ACPI specific ioapic_dynirq_base hackery.

Why do we need strict mappings in the non ACPI case for all ioapic
pins? What's so different about ACPI? Or is this just to avoid
breaking the existing SFI/devicetree stuff. If that's the reason I'm
fine with it, but ...

independent of this question we want to be more clever about the
handling of strict, legacy and freely associated linux irq numbers.

So you have this weird ioapic_create_domain_fn callback in
mp_register_ioapic, which is solely there so the different callers can
hand in their domain ops and eventually set the magic
ioapic_identity_map flag.

+void __init mp_register_ioapic(int id, u32 address, u32 gsi_base,
+  ioapic_create_domain_fn cb, void *arg)

What about having

struct ioapic_domain {
   struct irqdomain *domain;
   const struct irq_domain_ops  *ops;
   void *arg;
   enum domain_type type;
};

and add this struct to the ioapic struct. type is:

enum domain_type {
   IOAPIC_STRICT,
   IOAPIC_LEGACY,
   IOAPIC_DYNAMIC,
}; 


Now the register function changes to:

void mp_register_ioapic(int id, u32 address, u32 gsi_base,
const struct irq_domain_ops *ops,
int type)
{
...

ioapics[idx].irqdomain.ops = ops;
ioapics[idx].irqdomain.arg = arg;
ioapics[idx].irqdomain.type = type;
ioapics[idx].irqdomain.domain = NULL;

...
}

and you can use mp_irqdomain_create() 

[Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC

2014-05-27 Thread Jiang Liu
Currently x86 support identity mapping between GSI(IOAPIC pin) and IRQ
number, so continous IRQs at low end are statically allocated to IOAPICs
at boot time. This design causes trouble to support IOAPIC hotplug.

This patch implements basic mechanism to dynamically allocate IRQ on
demand for IOAPIC pins by using irqdomain framework.

It first adds several fields into struct ioapic to support irqdomain.
Then it implements an algorithm to dynamically allocate IRQ number
for IOAPIC pins on demand as below:
1) Legacy(ISA) IRQs is not managed by irqdomain because there may be
   multiple pins sharing the same IRQ number and current irqdomain
   only supports 1:1 mapping between pins and IRQ.
2) Build identity mapping for GSIs between NR_IRQS_LEGACY and
   arch_dynirq_lower_bound(0). This is typically used to support legacy
   IRQs and simple platforms.
3) Dynamically allocate IRQs for GSIs above arch_dynirq_lower_bound(0).
   This may be used to support big system and IOAPIC hotplug.
4) Dynamically allocate IRQs for non-ISA IRQs below NR_IRQS_LEGACY
   if platform doens't request identity mapping. This is to support
   some really weired platforms.
5) Build identity mapping for GSIs below NR_IRQS_LEGACY if platform requests
   identity mapping. This is to support MID and CE41000 etc.

Function arch_dynirq_lower_bound(0) will be enhanced in coming patch
to enable dynamic IRQ allocation for IOAPIC. To ease our life,
arch_dynirq_lower_bound(0) must be greater than or equal to
NR_IRQS_LEGACY, otherwise it may break backward compatibilities.

Signed-off-by: Jiang Liu 
---
 arch/x86/Kconfig   |1 +
 arch/x86/include/asm/io_apic.h |   11 +++-
 arch/x86/kernel/acpi/boot.c|8 +--
 arch/x86/kernel/apic/io_apic.c |  142 ++--
 4 files changed, 120 insertions(+), 42 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 47247708c9eb..ebdb1193821b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -832,6 +832,7 @@ config X86_IO_APIC
def_bool y
depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_IOAPIC || 
PCI_MSI
select GENERIC_IRQ_LEGACY_ALLOC_HWIRQ
+   select IRQ_DOMAIN
 
 config X86_REROUTE_FOR_BROKEN_BOOT_IRQS
bool "Reroute for broken boot IRQs"
diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 978e51fdcb59..4778d129f225 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -98,6 +98,8 @@ struct IR_IO_APIC_route_entry {
 #define IOAPIC_AUTO -1
 #define IOAPIC_EDGE 0
 #define IOAPIC_LEVEL1
+#defineIOAPIC_MAP_ALLOC0x1
+#defineIOAPIC_MAP_CHECK0x2
 
 #ifdef CONFIG_X86_IO_APIC
 
@@ -130,6 +132,9 @@ extern int noioapicquirk;
 /* -1 if "noapic" boot option passed */
 extern int noioapicreroute;
 
+/* Build identity mapping for non-ISA IRQs below NR_IRQS_LEGACY */
+extern int ioapic_identity_map;
+
 /*
  * If we use the IO-APIC for IRQ routing, disable automatic
  * assignment of PCI IRQ's.
@@ -169,10 +174,12 @@ struct mp_ioapic_gsi{
 };
 extern u32 gsi_top;
 
+typedef struct irq_domain *(*ioapic_create_domain_fn)(int idx, void *arg);
+
 extern int mp_find_ioapic(u32 gsi);
 extern int mp_find_ioapic_pin(int ioapic, u32 gsi);
 extern u32 mp_pin_to_gsi(int ioapic, int pin);
-extern int mp_map_gsi_to_irq(u32 gsi);
+extern int mp_map_gsi_to_irq(u32 gsi, unsigned int flags);
 extern void __init mp_register_ioapic(int id, u32 address, u32 gsi_base);
 extern void __init pre_init_apic_IRQ0(void);
 
@@ -215,7 +222,7 @@ static inline void ioapic_insert_resources(void) { }
 #define gsi_top (NR_IRQS_LEGACY)
 static inline int mp_find_ioapic(u32 gsi) { return 0; }
 static inline u32 mp_pin_to_gsi(int ioapic, int pin) { return UINT_MAX; }
-static inline int mp_map_gsi_to_irq(u32 gsi) { return gsi; }
+static inline int mp_map_gsi_to_irq(u32 gsi, unsigned int flags) { return gsi; 
}
 
 struct io_apic_irq_attr;
 static inline int io_apic_set_pci_routing(struct device *dev, int irq,
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index f740d2766c42..9bbdd685df64 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -99,7 +99,7 @@ static u32 isa_irq_to_gsi[NR_IRQS_LEGACY] __read_mostly = {
 
 #defineACPI_INVALID_GSIINT_MIN
 
-static int map_gsi_to_irq(unsigned int gsi)
+static int map_gsi_to_irq(unsigned int gsi, unsigned int flags)
 {
int i;
 
@@ -107,7 +107,7 @@ static int map_gsi_to_irq(unsigned int gsi)
if (isa_irq_to_gsi[i] == gsi)
return i;
 
-   return mp_map_gsi_to_irq(gsi);
+   return mp_map_gsi_to_irq(gsi, flags);
 }
 
 /*
@@ -483,7 +483,7 @@ void __init acpi_pic_sci_set_trigger(unsigned int irq, u16 
trigger)
 
 int acpi_gsi_to_irq(u32 gsi, unsigned int *irqp)
 {
-   int irq = map_gsi_to_irq(gsi);
+   int irq = map_gsi_to_irq(gsi, IOAPIC_MAP_ALLOC | 

[Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC

2014-05-27 Thread Jiang Liu
Currently x86 support identity mapping between GSI(IOAPIC pin) and IRQ
number, so continous IRQs at low end are statically allocated to IOAPICs
at boot time. This design causes trouble to support IOAPIC hotplug.

This patch implements basic mechanism to dynamically allocate IRQ on
demand for IOAPIC pins by using irqdomain framework.

It first adds several fields into struct ioapic to support irqdomain.
Then it implements an algorithm to dynamically allocate IRQ number
for IOAPIC pins on demand as below:
1) Legacy(ISA) IRQs is not managed by irqdomain because there may be
   multiple pins sharing the same IRQ number and current irqdomain
   only supports 1:1 mapping between pins and IRQ.
2) Build identity mapping for GSIs between NR_IRQS_LEGACY and
   arch_dynirq_lower_bound(0). This is typically used to support legacy
   IRQs and simple platforms.
3) Dynamically allocate IRQs for GSIs above arch_dynirq_lower_bound(0).
   This may be used to support big system and IOAPIC hotplug.
4) Dynamically allocate IRQs for non-ISA IRQs below NR_IRQS_LEGACY
   if platform doens't request identity mapping. This is to support
   some really weired platforms.
5) Build identity mapping for GSIs below NR_IRQS_LEGACY if platform requests
   identity mapping. This is to support MID and CE41000 etc.

Function arch_dynirq_lower_bound(0) will be enhanced in coming patch
to enable dynamic IRQ allocation for IOAPIC. To ease our life,
arch_dynirq_lower_bound(0) must be greater than or equal to
NR_IRQS_LEGACY, otherwise it may break backward compatibilities.

Signed-off-by: Jiang Liu jiang@linux.intel.com
---
 arch/x86/Kconfig   |1 +
 arch/x86/include/asm/io_apic.h |   11 +++-
 arch/x86/kernel/acpi/boot.c|8 +--
 arch/x86/kernel/apic/io_apic.c |  142 ++--
 4 files changed, 120 insertions(+), 42 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 47247708c9eb..ebdb1193821b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -832,6 +832,7 @@ config X86_IO_APIC
def_bool y
depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_IOAPIC || 
PCI_MSI
select GENERIC_IRQ_LEGACY_ALLOC_HWIRQ
+   select IRQ_DOMAIN
 
 config X86_REROUTE_FOR_BROKEN_BOOT_IRQS
bool Reroute for broken boot IRQs
diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 978e51fdcb59..4778d129f225 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -98,6 +98,8 @@ struct IR_IO_APIC_route_entry {
 #define IOAPIC_AUTO -1
 #define IOAPIC_EDGE 0
 #define IOAPIC_LEVEL1
+#defineIOAPIC_MAP_ALLOC0x1
+#defineIOAPIC_MAP_CHECK0x2
 
 #ifdef CONFIG_X86_IO_APIC
 
@@ -130,6 +132,9 @@ extern int noioapicquirk;
 /* -1 if noapic boot option passed */
 extern int noioapicreroute;
 
+/* Build identity mapping for non-ISA IRQs below NR_IRQS_LEGACY */
+extern int ioapic_identity_map;
+
 /*
  * If we use the IO-APIC for IRQ routing, disable automatic
  * assignment of PCI IRQ's.
@@ -169,10 +174,12 @@ struct mp_ioapic_gsi{
 };
 extern u32 gsi_top;
 
+typedef struct irq_domain *(*ioapic_create_domain_fn)(int idx, void *arg);
+
 extern int mp_find_ioapic(u32 gsi);
 extern int mp_find_ioapic_pin(int ioapic, u32 gsi);
 extern u32 mp_pin_to_gsi(int ioapic, int pin);
-extern int mp_map_gsi_to_irq(u32 gsi);
+extern int mp_map_gsi_to_irq(u32 gsi, unsigned int flags);
 extern void __init mp_register_ioapic(int id, u32 address, u32 gsi_base);
 extern void __init pre_init_apic_IRQ0(void);
 
@@ -215,7 +222,7 @@ static inline void ioapic_insert_resources(void) { }
 #define gsi_top (NR_IRQS_LEGACY)
 static inline int mp_find_ioapic(u32 gsi) { return 0; }
 static inline u32 mp_pin_to_gsi(int ioapic, int pin) { return UINT_MAX; }
-static inline int mp_map_gsi_to_irq(u32 gsi) { return gsi; }
+static inline int mp_map_gsi_to_irq(u32 gsi, unsigned int flags) { return gsi; 
}
 
 struct io_apic_irq_attr;
 static inline int io_apic_set_pci_routing(struct device *dev, int irq,
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index f740d2766c42..9bbdd685df64 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -99,7 +99,7 @@ static u32 isa_irq_to_gsi[NR_IRQS_LEGACY] __read_mostly = {
 
 #defineACPI_INVALID_GSIINT_MIN
 
-static int map_gsi_to_irq(unsigned int gsi)
+static int map_gsi_to_irq(unsigned int gsi, unsigned int flags)
 {
int i;
 
@@ -107,7 +107,7 @@ static int map_gsi_to_irq(unsigned int gsi)
if (isa_irq_to_gsi[i] == gsi)
return i;
 
-   return mp_map_gsi_to_irq(gsi);
+   return mp_map_gsi_to_irq(gsi, flags);
 }
 
 /*
@@ -483,7 +483,7 @@ void __init acpi_pic_sci_set_trigger(unsigned int irq, u16 
trigger)
 
 int acpi_gsi_to_irq(u32 gsi, unsigned int *irqp)
 {
-   int irq = map_gsi_to_irq(gsi);
+   int irq = map_gsi_to_irq(gsi, 

Re: [Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC

2014-05-27 Thread Thomas Gleixner
Jiang,

On Tue, 27 May 2014, Jiang Liu wrote:

 +static int alloc_irq_from_domain(struct irq_domain *domain, u32 gsi, int pin)
  {
 + int irq = -1;
 +
 + if (gsi = arch_dynirq_lower_bound(0)) {
 + irq = irq_create_mapping(domain, pin);
 + } else if (gsi  NR_IRQS_LEGACY) {
 + if (!ioapic_identity_map)
 + irq = irq_create_mapping(domain, pin);
 + else if (irq_domain_associate(domain, gsi, pin) == 0)
 + irq = gsi;
 + } else if (irq_create_strict_mappings(domain, gsi, pin, 1) == 0) {
 + irq = gsi;
 + }

So you have these cases covered here:

1) The ACPI case of secondary ioapics. You only have the strict 1:1
   mapping for the first ioapic

2) The gsi  NR_IRQS_LEGACY case where you have two options:

a) Let the core create a random virq number if ioapic_identity_map
   is 0

   ioapic_identity_map is only set by SFI and devicetree

   So in all other cases we fall into that code path for all
   legacy interrupts. So how is that supposed to work lets say for
   i8042 which has hardcoded irq 1 and 12?

   irq_create_mapping(1)
   
hint = 1 % nr_irqs; -- 1
virq = irq_alloc_desc_from(hint, of_node_to_nid(domain-of_node));

This returns something = 16, because the irq descriptors
for 0-15 (LEGACY) are allocated already.

   The pin association works, but how is the i8042 driver supposed
   to figure out that it should request the virq =16 which was
   created instead of the hardcoded 1 ?

b) Associate the gsi and the pin

   This only works because the virqs are already allocated at boot
   time unconditionally due to arch_probe_nr_irqs() returning
   NR_IRQS_LEGACY. So irq_domain_associate() works.
   Undocumented works by chance behaviour.

3) The case where gsi  arch_dynirq_lower_bound()

   You create a strict mapping here, fine.

This is confusing at best.

First of all, we should use legacy_pic-nr_legacy_irqs instead of
NR_IRQS_LEGACY all over the place.

mshyperv, ce4100 and intel-mid use the null_legacy_pic which has
nr_legacy_irqs = 0 and everything else uses the real pic which has
nr_legacy_irqs = NR_IRQS_LEGACY. So why do we even bother to allocate
and deal with NR_IRQS_LEGACY in the cases where we have no legacy?

ce4100 is an oddball though. The ioapic is registered way before the
interrupt subsystem is initialized and I have a hard time to
understand that comment:

/* We can't set this earlier, because we need to calibrate the timer */
legacy_pic = null_legacy_pic;

The timer calibration happens after the interrupts are set up. I
assume it's check_timer() which wants that, but we know exactly how
the ce4100 works, so we might be able to avoid that whole testing
stuff. Sebastian, any input on this?

If it turns out that ce4100 needs the inital real legacy pic for some
magic reason we still can be clever by extending the legacy pic data
structure to tell us about that change, i.e. instead of using
legacy_pic-nr_legacy_irqs having a field nr_allocated_irqs, which
is set to NR_IRQS_LEGACY for the real pic and to 0 for the null_pic
and let ce4100 set that field to NR_IRQS_LEGACY before switching the
legacy_pic over to the null implementation.


But what's really disgusting is the magic ioapic_identity_map and the
extra ACPI specific ioapic_dynirq_base hackery.

Why do we need strict mappings in the non ACPI case for all ioapic
pins? What's so different about ACPI? Or is this just to avoid
breaking the existing SFI/devicetree stuff. If that's the reason I'm
fine with it, but ...

independent of this question we want to be more clever about the
handling of strict, legacy and freely associated linux irq numbers.

So you have this weird ioapic_create_domain_fn callback in
mp_register_ioapic, which is solely there so the different callers can
hand in their domain ops and eventually set the magic
ioapic_identity_map flag.

+void __init mp_register_ioapic(int id, u32 address, u32 gsi_base,
+  ioapic_create_domain_fn cb, void *arg)

What about having

struct ioapic_domain {
   struct irqdomain *domain;
   const struct irq_domain_ops  *ops;
   void *arg;
   enum domain_type type;
};

and add this struct to the ioapic struct. type is:

enum domain_type {
   IOAPIC_STRICT,
   IOAPIC_LEGACY,
   IOAPIC_DYNAMIC,
}; 


Now the register function changes to:

void mp_register_ioapic(int id, u32 address, u32 gsi_base,
const struct irq_domain_ops *ops,
int type)
{
...

ioapics[idx].irqdomain.ops = ops;
ioapics[idx].irqdomain.arg = arg;
ioapics[idx].irqdomain.type = type;
ioapics[idx].irqdomain.domain = NULL;

...
}

and you can use mp_irqdomain_create() unconditionally for 

Re: [Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC

2014-05-27 Thread Jiang Liu
Hi Thomas,
Thanks for your comments. Please refer to inline
comments below.

On 2014/5/28 3:58, Thomas Gleixner wrote:
 Jiang,
 
 On Tue, 27 May 2014, Jiang Liu wrote:
 
 +static int alloc_irq_from_domain(struct irq_domain *domain, u32 gsi, int 
 pin)
  {
 +int irq = -1;
 +
 +if (gsi = arch_dynirq_lower_bound(0)) {
 +irq = irq_create_mapping(domain, pin);
 +} else if (gsi  NR_IRQS_LEGACY) {
 +if (!ioapic_identity_map)
 +irq = irq_create_mapping(domain, pin);
 +else if (irq_domain_associate(domain, gsi, pin) == 0)
 +irq = gsi;
 +} else if (irq_create_strict_mappings(domain, gsi, pin, 1) == 0) {
 +irq = gsi;
 +}
 
 So you have these cases covered here:
 
 1) The ACPI case of secondary ioapics. You only have the strict 1:1
mapping for the first ioapic
 
 2) The gsi  NR_IRQS_LEGACY case where you have two options:
 
 a) Let the core create a random virq number if ioapic_identity_map
is 0
 
ioapic_identity_map is only set by SFI and devicetree
 
So in all other cases we fall into that code path for all
legacy interrupts. So how is that supposed to work lets say for
i8042 which has hardcoded irq 1 and 12?
 
irq_create_mapping(1)

   hint = 1 % nr_irqs; -- 1
   virq = irq_alloc_desc_from(hint, of_node_to_nid(domain-of_node));
 
   This returns something = 16, because the irq descriptors
   for 0-15 (LEGACY) are allocated already.
 
The pin association works, but how is the i8042 driver supposed
to figure out that it should request the virq =16 which was
created instead of the hardcoded 1 ?
This is used to work around special non-ISA interrupts with GSI below
NR_IRQS_LEGACY. The original code for the special case is:
/*
 * Provide an identity mapping of gsi == irq except on truly
 * weird platforms that have non isa irqs in the first 16 gsis.
 */
return gsi = NR_IRQS_LEGACY ? gsi : gsi_top + gsi;

We have one path to handle ISA IRQs before calling
alloc_irq_from_domain() as below:
if (idx = 0  test_bit(mp_irqs[idx].srcbus, mp_bus_not_pci))
return mp_irqs[idx].srcbusirq;

   
 b) Associate the gsi and the pin
 
This only works because the virqs are already allocated at boot
time unconditionally due to arch_probe_nr_irqs() returning
NR_IRQS_LEGACY. So irq_domain_associate() works.
Undocumented works by chance behaviour.
Yes. It's a good suggestion to enhance legacy_pic to make this
code more clear.

 
 3) The case where gsi  arch_dynirq_lower_bound()
 
You create a strict mapping here, fine.
 
 This is confusing at best.
 
 First of all, we should use legacy_pic-nr_legacy_irqs instead of
 NR_IRQS_LEGACY all over the place.
 
 mshyperv, ce4100 and intel-mid use the null_legacy_pic which has
 nr_legacy_irqs = 0 and everything else uses the real pic which has
 nr_legacy_irqs = NR_IRQS_LEGACY. So why do we even bother to allocate
 and deal with NR_IRQS_LEGACY in the cases where we have no legacy?
I'm not sure whether it works with ce4100, so used NR_IRQS_LEGACY
instead of legacy_pic-nr_legacy_irqs for safety. Will try to refine
it in next version.

 
 ce4100 is an oddball though. The ioapic is registered way before the
 interrupt subsystem is initialized and I have a hard time to
 understand that comment:
 
 /* We can't set this earlier, because we need to calibrate the timer 
 */
 legacy_pic = null_legacy_pic;
I haven't figured out the story behind the comment yet:(

 
 The timer calibration happens after the interrupts are set up. I
 assume it's check_timer() which wants that, but we know exactly how
 the ce4100 works, so we might be able to avoid that whole testing
 stuff. Sebastian, any input on this?
 
 If it turns out that ce4100 needs the inital real legacy pic for some
 magic reason we still can be clever by extending the legacy pic data
 structure to tell us about that change, i.e. instead of using
 legacy_pic-nr_legacy_irqs having a field nr_allocated_irqs, which
 is set to NR_IRQS_LEGACY for the real pic and to 0 for the null_pic
 and let ce4100 set that field to NR_IRQS_LEGACY before switching the
 legacy_pic over to the null implementation.
Good suggestion, will try this way.

 But what's really disgusting is the magic ioapic_identity_map and the
 extra ACPI specific ioapic_dynirq_base hackery.
 
 Why do we need strict mappings in the non ACPI case for all ioapic
 pins? What's so different about ACPI? Or is this just to avoid
 breaking the existing SFI/devicetree stuff. If that's the reason I'm
 fine with it, but ...
It's to avoid breaking SFI/intel_mid stuff. intel_mid assumes IRQ
number equals to pin number and use pci_dev-irq to save both IRQ
number and pin number.

 
 independent of this question we want to be more clever about the
 handling of strict, legacy and freely associated linux