Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-20 Thread Peter Zijlstra
On Wed, Dec 20, 2017 at 10:08:21PM +0800, Zhang Rui wrote:

> > Does this help?
> > 
> No.

Bah!, does this at least get you a IRQ0 line in /proc/interrupts?

> > diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c
> > index 749d189f8cd4..45675072771c 100644
> > --- a/arch/x86/kernel/time.c
> > +++ b/arch/x86/kernel/time.c
> > @@ -69,8 +69,6 @@ static struct irqaction irq0  = {
> >  
> >  static void __init setup_default_timer_irq(void)
> >  {
> > -   if (!nr_legacy_irqs())
> > -   return;
> >     setup_irq(0, );
> >  }
> >  


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-20 Thread Peter Zijlstra
On Wed, Dec 20, 2017 at 10:08:21PM +0800, Zhang Rui wrote:

> > Does this help?
> > 
> No.

Bah!, does this at least get you a IRQ0 line in /proc/interrupts?

> > diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c
> > index 749d189f8cd4..45675072771c 100644
> > --- a/arch/x86/kernel/time.c
> > +++ b/arch/x86/kernel/time.c
> > @@ -69,8 +69,6 @@ static struct irqaction irq0  = {
> >  
> >  static void __init setup_default_timer_irq(void)
> >  {
> > -   if (!nr_legacy_irqs())
> > -   return;
> >     setup_irq(0, );
> >  }
> >  


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-20 Thread Zhang Rui
On Tue, 2017-12-19 at 18:23 +0100, Peter Zijlstra wrote:
> On Tue, Dec 19, 2017 at 05:01:55PM +0100, Peter Zijlstra wrote:
> > 
> > On Tue, Dec 19, 2017 at 11:42:41PM +0800, Zhang Rui wrote:
> > > 
> > > On Tue, 2017-12-19 at 16:23 +0100, Peter Zijlstra wrote:
> > > 
> > > > 
> > > > [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2
> > > > dfl dfl)
> > > > [0.00] ACPI: IRQ0 used by override.
> > > > 
> > > > So your ACPI table has an override for IRQ2 and routes it to
> > > > IRQ0.
> > ^ this
> > 
> > > 
> > > > 
> > > >  The HPET document says:
> > > > 
> > > >   If the ENABLE_CNF bit and the LEG_RT_CNF bit are both set,
> > > > then the
> > > >   interrupts will be routed as follows:
> > > > 
> > > > Timer 0 will be routed to IRQ0 in Non-APIC or IRQ2 in the
> > > > I/O APIC
> > > But AFAICS, the HPET emulated timer interrupts goes to IRQ0
> > Right, so see that ACPI override, that routes I/O APIC IRQ2 to
> > IRQ0, or
> > it _should_.
> > 
> > Clearly something is messed up here.. but I've no idea what. That
> > whole
> > IRQ routing stuff is confusing.
> Does this help?
> 
No.

thanks,
rui
> diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c
> index 749d189f8cd4..45675072771c 100644
> --- a/arch/x86/kernel/time.c
> +++ b/arch/x86/kernel/time.c
> @@ -69,8 +69,6 @@ static struct irqaction irq0  = {
>  
>  static void __init setup_default_timer_irq(void)
>  {
> - if (!nr_legacy_irqs())
> - return;
>   setup_irq(0, );
>  }
>  


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-20 Thread Zhang Rui
On Tue, 2017-12-19 at 18:23 +0100, Peter Zijlstra wrote:
> On Tue, Dec 19, 2017 at 05:01:55PM +0100, Peter Zijlstra wrote:
> > 
> > On Tue, Dec 19, 2017 at 11:42:41PM +0800, Zhang Rui wrote:
> > > 
> > > On Tue, 2017-12-19 at 16:23 +0100, Peter Zijlstra wrote:
> > > 
> > > > 
> > > > [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2
> > > > dfl dfl)
> > > > [0.00] ACPI: IRQ0 used by override.
> > > > 
> > > > So your ACPI table has an override for IRQ2 and routes it to
> > > > IRQ0.
> > ^ this
> > 
> > > 
> > > > 
> > > >  The HPET document says:
> > > > 
> > > >   If the ENABLE_CNF bit and the LEG_RT_CNF bit are both set,
> > > > then the
> > > >   interrupts will be routed as follows:
> > > > 
> > > > Timer 0 will be routed to IRQ0 in Non-APIC or IRQ2 in the
> > > > I/O APIC
> > > But AFAICS, the HPET emulated timer interrupts goes to IRQ0
> > Right, so see that ACPI override, that routes I/O APIC IRQ2 to
> > IRQ0, or
> > it _should_.
> > 
> > Clearly something is messed up here.. but I've no idea what. That
> > whole
> > IRQ routing stuff is confusing.
> Does this help?
> 
No.

thanks,
rui
> diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c
> index 749d189f8cd4..45675072771c 100644
> --- a/arch/x86/kernel/time.c
> +++ b/arch/x86/kernel/time.c
> @@ -69,8 +69,6 @@ static struct irqaction irq0  = {
>  
>  static void __init setup_default_timer_irq(void)
>  {
> - if (!nr_legacy_irqs())
> - return;
>   setup_irq(0, );
>  }
>  


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-19 Thread Peter Zijlstra
On Tue, Dec 19, 2017 at 05:01:55PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 19, 2017 at 11:42:41PM +0800, Zhang Rui wrote:
> > On Tue, 2017-12-19 at 16:23 +0100, Peter Zijlstra wrote:
> 
> > > [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> > > [0.00] ACPI: IRQ0 used by override.
> > > 
> > > So your ACPI table has an override for IRQ2 and routes it to IRQ0.
> 
> ^ this
> 
> > >  The HPET document says:
> > > 
> > >   If the ENABLE_CNF bit and the LEG_RT_CNF bit are both set, then the
> > >   interrupts will be routed as follows:
> > > 
> > > Timer 0 will be routed to IRQ0 in Non-APIC or IRQ2 in the I/O APIC
> > 
> > But AFAICS, the HPET emulated timer interrupts goes to IRQ0
> 
> Right, so see that ACPI override, that routes I/O APIC IRQ2 to IRQ0, or
> it _should_.
> 
> Clearly something is messed up here.. but I've no idea what. That whole
> IRQ routing stuff is confusing.

Does this help?

diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c
index 749d189f8cd4..45675072771c 100644
--- a/arch/x86/kernel/time.c
+++ b/arch/x86/kernel/time.c
@@ -69,8 +69,6 @@ static struct irqaction irq0  = {
 
 static void __init setup_default_timer_irq(void)
 {
-   if (!nr_legacy_irqs())
-   return;
setup_irq(0, );
 }
 


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-19 Thread Peter Zijlstra
On Tue, Dec 19, 2017 at 05:01:55PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 19, 2017 at 11:42:41PM +0800, Zhang Rui wrote:
> > On Tue, 2017-12-19 at 16:23 +0100, Peter Zijlstra wrote:
> 
> > > [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> > > [0.00] ACPI: IRQ0 used by override.
> > > 
> > > So your ACPI table has an override for IRQ2 and routes it to IRQ0.
> 
> ^ this
> 
> > >  The HPET document says:
> > > 
> > >   If the ENABLE_CNF bit and the LEG_RT_CNF bit are both set, then the
> > >   interrupts will be routed as follows:
> > > 
> > > Timer 0 will be routed to IRQ0 in Non-APIC or IRQ2 in the I/O APIC
> > 
> > But AFAICS, the HPET emulated timer interrupts goes to IRQ0
> 
> Right, so see that ACPI override, that routes I/O APIC IRQ2 to IRQ0, or
> it _should_.
> 
> Clearly something is messed up here.. but I've no idea what. That whole
> IRQ routing stuff is confusing.

Does this help?

diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c
index 749d189f8cd4..45675072771c 100644
--- a/arch/x86/kernel/time.c
+++ b/arch/x86/kernel/time.c
@@ -69,8 +69,6 @@ static struct irqaction irq0  = {
 
 static void __init setup_default_timer_irq(void)
 {
-   if (!nr_legacy_irqs())
-   return;
setup_irq(0, );
 }
 


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-19 Thread Peter Zijlstra
On Tue, Dec 19, 2017 at 11:42:41PM +0800, Zhang Rui wrote:
> On Tue, 2017-12-19 at 16:23 +0100, Peter Zijlstra wrote:

> > [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> > [0.00] ACPI: IRQ0 used by override.
> > 
> > So your ACPI table has an override for IRQ2 and routes it to IRQ0.

^ this

> >  The HPET document says:
> > 
> >   If the ENABLE_CNF bit and the LEG_RT_CNF bit are both set, then the
> >   interrupts will be routed as follows:
> > 
> > Timer 0 will be routed to IRQ0 in Non-APIC or IRQ2 in the I/O APIC
> 
> But AFAICS, the HPET emulated timer interrupts goes to IRQ0

Right, so see that ACPI override, that routes I/O APIC IRQ2 to IRQ0, or
it _should_.

Clearly something is messed up here.. but I've no idea what. That whole
IRQ routing stuff is confusing.




Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-19 Thread Peter Zijlstra
On Tue, Dec 19, 2017 at 11:42:41PM +0800, Zhang Rui wrote:
> On Tue, 2017-12-19 at 16:23 +0100, Peter Zijlstra wrote:

> > [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> > [0.00] ACPI: IRQ0 used by override.
> > 
> > So your ACPI table has an override for IRQ2 and routes it to IRQ0.

^ this

> >  The HPET document says:
> > 
> >   If the ENABLE_CNF bit and the LEG_RT_CNF bit are both set, then the
> >   interrupts will be routed as follows:
> > 
> > Timer 0 will be routed to IRQ0 in Non-APIC or IRQ2 in the I/O APIC
> 
> But AFAICS, the HPET emulated timer interrupts goes to IRQ0

Right, so see that ACPI override, that routes I/O APIC IRQ2 to IRQ0, or
it _should_.

Clearly something is messed up here.. but I've no idea what. That whole
IRQ routing stuff is confusing.




Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-19 Thread Zhang Rui
On Tue, 2017-12-19 at 16:31 +0100, Peter Zijlstra wrote:
> On Tue, Dec 19, 2017 at 04:23:07PM +0100, Peter Zijlstra wrote:
> > 
> > On Tue, Dec 19, 2017 at 06:48:24PM +0800, Zhang Rui wrote:
> > > 
> > > On Mon, 2017-12-18 at 21:28 +0100, Peter Zijlstra wrote:
> > > > 
> > > > Hi, can you see if this makes you Surface boot?
> > > > 
> > > No, it does not boot.
> > So I'm confused on the lapic calibration.
> > 
> > That stuff uses global_clock_event, which is initially the i8253
> > (PIT),
> > but because !PIC this thing won't be there either on your platform.
> > 
> > Then we initialize I/O APIC, and your machine has:
> > 
> > [0.00] IOAPIC[0]: apic_id 2, version 32, address
> > 0xfec0, GSI 0-119
> > [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl
> > dfl)
> > [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high
> > level)
> > [0.00] ACPI: IRQ0 used by override.
> > [0.00] ACPI: IRQ9 used by override.
> > 
> > So your ACPI table has an override for IRQ2 and routes it to IRQ0.
> > 
> > Then we initialize HPET, and we _always_ do
> > hpet_enable_legacy_int(),
> > which sets the LegacyRouting bit. The HPET document says:
> > 
> >   If the ENABLE_CNF bit and the LEG_RT_CNF bit are both set, then
> > the
> >   interrupts will be routed as follows:
> > 
> > Timer 0 will be routed to IRQ0 in Non-APIC or IRQ2 in the I/O
> > APIC
> > Timer 1 will be routed to IRQ8 in Non-APIC or IRQ8 in the I/O
> > APIC
> > Timer 2-n will be routed as per the routing in the timer n
> > config registers.
> > 
> >   If the LegacyReplacement Route bit is set, the individual routing
> > bits
> >   for timers 0 and 1 (APIC or FSB) will have no impact.
> > 
> > And then we set global_clock_event to _clockevent.
> > 
> > At this point that _SHOULD_ work afaict, even without actual PIC
> > present.
> > 
> > Sometime after that we call into calibrate_APIC_clock() -- because
> > !TSC_DEADLINE -- and this is where you get stuck, because
> > global_clock_event is not in fact delivering interrupts.
> > 
> > Thomas may have more clue, we'll have to wait for him to have a
> > time-slot available.
> What does your /proc/interrupts look like (on a tsc-deadline boot) ?
> 
> 0: is the HPET, LOC: is the lapic/tsc-deadline
> 
> On my SKL desktop I get 21 PIT/HPET ticks on CPU0 before lapic takes
> over.
No irq0.
 LOC: 220071 210079 184892 176494   Local timer
interrupts

thanks,
rui


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-19 Thread Zhang Rui
On Tue, 2017-12-19 at 16:31 +0100, Peter Zijlstra wrote:
> On Tue, Dec 19, 2017 at 04:23:07PM +0100, Peter Zijlstra wrote:
> > 
> > On Tue, Dec 19, 2017 at 06:48:24PM +0800, Zhang Rui wrote:
> > > 
> > > On Mon, 2017-12-18 at 21:28 +0100, Peter Zijlstra wrote:
> > > > 
> > > > Hi, can you see if this makes you Surface boot?
> > > > 
> > > No, it does not boot.
> > So I'm confused on the lapic calibration.
> > 
> > That stuff uses global_clock_event, which is initially the i8253
> > (PIT),
> > but because !PIC this thing won't be there either on your platform.
> > 
> > Then we initialize I/O APIC, and your machine has:
> > 
> > [0.00] IOAPIC[0]: apic_id 2, version 32, address
> > 0xfec0, GSI 0-119
> > [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl
> > dfl)
> > [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high
> > level)
> > [0.00] ACPI: IRQ0 used by override.
> > [0.00] ACPI: IRQ9 used by override.
> > 
> > So your ACPI table has an override for IRQ2 and routes it to IRQ0.
> > 
> > Then we initialize HPET, and we _always_ do
> > hpet_enable_legacy_int(),
> > which sets the LegacyRouting bit. The HPET document says:
> > 
> >   If the ENABLE_CNF bit and the LEG_RT_CNF bit are both set, then
> > the
> >   interrupts will be routed as follows:
> > 
> > Timer 0 will be routed to IRQ0 in Non-APIC or IRQ2 in the I/O
> > APIC
> > Timer 1 will be routed to IRQ8 in Non-APIC or IRQ8 in the I/O
> > APIC
> > Timer 2-n will be routed as per the routing in the timer n
> > config registers.
> > 
> >   If the LegacyReplacement Route bit is set, the individual routing
> > bits
> >   for timers 0 and 1 (APIC or FSB) will have no impact.
> > 
> > And then we set global_clock_event to _clockevent.
> > 
> > At this point that _SHOULD_ work afaict, even without actual PIC
> > present.
> > 
> > Sometime after that we call into calibrate_APIC_clock() -- because
> > !TSC_DEADLINE -- and this is where you get stuck, because
> > global_clock_event is not in fact delivering interrupts.
> > 
> > Thomas may have more clue, we'll have to wait for him to have a
> > time-slot available.
> What does your /proc/interrupts look like (on a tsc-deadline boot) ?
> 
> 0: is the HPET, LOC: is the lapic/tsc-deadline
> 
> On my SKL desktop I get 21 PIT/HPET ticks on CPU0 before lapic takes
> over.
No irq0.
 LOC: 220071 210079 184892 176494   Local timer
interrupts

thanks,
rui


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-19 Thread Zhang Rui
On Tue, 2017-12-19 at 16:23 +0100, Peter Zijlstra wrote:
> On Tue, Dec 19, 2017 at 06:48:24PM +0800, Zhang Rui wrote:
> > 
> > On Mon, 2017-12-18 at 21:28 +0100, Peter Zijlstra wrote:
> > > 
> > > Hi, can you see if this makes you Surface boot?
> > > 
> > No, it does not boot.
> So I'm confused on the lapic calibration.
> 
> That stuff uses global_clock_event, which is initially the i8253
> (PIT),
> but because !PIC this thing won't be there either on your platform.
> 
> Then we initialize I/O APIC, and your machine has:
> 
> [0.00] IOAPIC[0]: apic_id 2, version 32, address 0xfec0,
> GSI 0-119
> [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl
> dfl)
> [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high
> level)
> [0.00] ACPI: IRQ0 used by override.
> [0.00] ACPI: IRQ9 used by override.
> 
> So your ACPI table has an override for IRQ2 and routes it to IRQ0.
> 
> Then we initialize HPET, and we _always_ do hpet_enable_legacy_int(),
> which sets the LegacyRouting bit.

Right.

>  The HPET document says:
> 
>   If the ENABLE_CNF bit and the LEG_RT_CNF bit are both set, then the
>   interrupts will be routed as follows:
> 
> Timer 0 will be routed to IRQ0 in Non-APIC or IRQ2 in the I/O
> APIC

But AFAICS, the HPET emulated timer interrupts goes to IRQ0 on all the
machines I have tested, but on this MS Surface Pro 4, there is no irq 0
row in /proc/interrupts.

$ cat /proc/interrupts 
CPU0   CPU1   CPU2   CPU3   
   8:  0  1  0  0  IR-IO-APIC8-
edge  rtc0
   9:476144   4573132  IR-IO-APIC9-
fasteoi   acpi
  14:  0  0  0  0  IR-IO-APIC   14-
edge  INT344B:00
  16:646   4971  63574   3973  IR-IO-APIC   16-
fasteoi   idma64.0, MRVL_PCIE, i2c_designware.0
  17:  0  0  0  0  IR-IO-APIC   17-
fasteoi   idma64.1, i2c_designware.1
  18:  0  0  0  0  IR-IO-APIC   18-
fasteoi   idma64.2, i2c_designware.2
  19:  0  0  0  0  IR-IO-APIC   19-
fasteoi   idma64.3, i2c_designware.3

Maybe I need to check if IRQ0 is overridden on other platforms, if no,
registering to IRQ2 for HPET Timer 0 could help in this case?

> Timer 1 will be routed to IRQ8 in Non-APIC or IRQ8 in the I/O
> APIC
> Timer 2-n will be routed as per the routing in the timer n config
> registers.
> 
>   If the LegacyReplacement Route bit is set, the individual routing
> bits
>   for timers 0 and 1 (APIC or FSB) will have no impact.
> 
> And then we set global_clock_event to _clockevent.

Yes, I can confirm this.

> 
> At this point that _SHOULD_ work afaict, even without actual PIC
> present.

so IOAPIC is ready when we calibrating Lapic timer?
> 
> Sometime after that we call into calibrate_APIC_clock() -- because
> !TSC_DEADLINE -- and this is where you get stuck, because
> global_clock_event is not in fact delivering interrupts.

right.
> 
> Thomas may have more clue, we'll have to wait for him to have a
> time-slot available.

okay.

thanks,
rui


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-19 Thread Zhang Rui
On Tue, 2017-12-19 at 16:23 +0100, Peter Zijlstra wrote:
> On Tue, Dec 19, 2017 at 06:48:24PM +0800, Zhang Rui wrote:
> > 
> > On Mon, 2017-12-18 at 21:28 +0100, Peter Zijlstra wrote:
> > > 
> > > Hi, can you see if this makes you Surface boot?
> > > 
> > No, it does not boot.
> So I'm confused on the lapic calibration.
> 
> That stuff uses global_clock_event, which is initially the i8253
> (PIT),
> but because !PIC this thing won't be there either on your platform.
> 
> Then we initialize I/O APIC, and your machine has:
> 
> [0.00] IOAPIC[0]: apic_id 2, version 32, address 0xfec0,
> GSI 0-119
> [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl
> dfl)
> [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high
> level)
> [0.00] ACPI: IRQ0 used by override.
> [0.00] ACPI: IRQ9 used by override.
> 
> So your ACPI table has an override for IRQ2 and routes it to IRQ0.
> 
> Then we initialize HPET, and we _always_ do hpet_enable_legacy_int(),
> which sets the LegacyRouting bit.

Right.

>  The HPET document says:
> 
>   If the ENABLE_CNF bit and the LEG_RT_CNF bit are both set, then the
>   interrupts will be routed as follows:
> 
> Timer 0 will be routed to IRQ0 in Non-APIC or IRQ2 in the I/O
> APIC

But AFAICS, the HPET emulated timer interrupts goes to IRQ0 on all the
machines I have tested, but on this MS Surface Pro 4, there is no irq 0
row in /proc/interrupts.

$ cat /proc/interrupts 
CPU0   CPU1   CPU2   CPU3   
   8:  0  1  0  0  IR-IO-APIC8-
edge  rtc0
   9:476144   4573132  IR-IO-APIC9-
fasteoi   acpi
  14:  0  0  0  0  IR-IO-APIC   14-
edge  INT344B:00
  16:646   4971  63574   3973  IR-IO-APIC   16-
fasteoi   idma64.0, MRVL_PCIE, i2c_designware.0
  17:  0  0  0  0  IR-IO-APIC   17-
fasteoi   idma64.1, i2c_designware.1
  18:  0  0  0  0  IR-IO-APIC   18-
fasteoi   idma64.2, i2c_designware.2
  19:  0  0  0  0  IR-IO-APIC   19-
fasteoi   idma64.3, i2c_designware.3

Maybe I need to check if IRQ0 is overridden on other platforms, if no,
registering to IRQ2 for HPET Timer 0 could help in this case?

> Timer 1 will be routed to IRQ8 in Non-APIC or IRQ8 in the I/O
> APIC
> Timer 2-n will be routed as per the routing in the timer n config
> registers.
> 
>   If the LegacyReplacement Route bit is set, the individual routing
> bits
>   for timers 0 and 1 (APIC or FSB) will have no impact.
> 
> And then we set global_clock_event to _clockevent.

Yes, I can confirm this.

> 
> At this point that _SHOULD_ work afaict, even without actual PIC
> present.

so IOAPIC is ready when we calibrating Lapic timer?
> 
> Sometime after that we call into calibrate_APIC_clock() -- because
> !TSC_DEADLINE -- and this is where you get stuck, because
> global_clock_event is not in fact delivering interrupts.

right.
> 
> Thomas may have more clue, we'll have to wait for him to have a
> time-slot available.

okay.

thanks,
rui


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-19 Thread Peter Zijlstra
On Tue, Dec 19, 2017 at 04:23:07PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 19, 2017 at 06:48:24PM +0800, Zhang Rui wrote:
> > On Mon, 2017-12-18 at 21:28 +0100, Peter Zijlstra wrote:
> > > Hi, can you see if this makes you Surface boot?
> > > 
> > No, it does not boot.
> 
> So I'm confused on the lapic calibration.
> 
> That stuff uses global_clock_event, which is initially the i8253 (PIT),
> but because !PIC this thing won't be there either on your platform.
> 
> Then we initialize I/O APIC, and your machine has:
> 
> [0.00] IOAPIC[0]: apic_id 2, version 32, address 0xfec0, GSI 0-119
> [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> [0.00] ACPI: IRQ0 used by override.
> [0.00] ACPI: IRQ9 used by override.
> 
> So your ACPI table has an override for IRQ2 and routes it to IRQ0.
> 
> Then we initialize HPET, and we _always_ do hpet_enable_legacy_int(),
> which sets the LegacyRouting bit. The HPET document says:
> 
>   If the ENABLE_CNF bit and the LEG_RT_CNF bit are both set, then the
>   interrupts will be routed as follows:
> 
> Timer 0 will be routed to IRQ0 in Non-APIC or IRQ2 in the I/O APIC
> Timer 1 will be routed to IRQ8 in Non-APIC or IRQ8 in the I/O APIC
> Timer 2-n will be routed as per the routing in the timer n config 
> registers.
> 
>   If the LegacyReplacement Route bit is set, the individual routing bits
>   for timers 0 and 1 (APIC or FSB) will have no impact.
> 
> And then we set global_clock_event to _clockevent.
> 
> At this point that _SHOULD_ work afaict, even without actual PIC
> present.
> 
> Sometime after that we call into calibrate_APIC_clock() -- because
> !TSC_DEADLINE -- and this is where you get stuck, because
> global_clock_event is not in fact delivering interrupts.
> 
> Thomas may have more clue, we'll have to wait for him to have a
> time-slot available.

What does your /proc/interrupts look like (on a tsc-deadline boot) ?

0: is the HPET, LOC: is the lapic/tsc-deadline

On my SKL desktop I get 21 PIT/HPET ticks on CPU0 before lapic takes over.


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-19 Thread Peter Zijlstra
On Tue, Dec 19, 2017 at 04:23:07PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 19, 2017 at 06:48:24PM +0800, Zhang Rui wrote:
> > On Mon, 2017-12-18 at 21:28 +0100, Peter Zijlstra wrote:
> > > Hi, can you see if this makes you Surface boot?
> > > 
> > No, it does not boot.
> 
> So I'm confused on the lapic calibration.
> 
> That stuff uses global_clock_event, which is initially the i8253 (PIT),
> but because !PIC this thing won't be there either on your platform.
> 
> Then we initialize I/O APIC, and your machine has:
> 
> [0.00] IOAPIC[0]: apic_id 2, version 32, address 0xfec0, GSI 0-119
> [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> [0.00] ACPI: IRQ0 used by override.
> [0.00] ACPI: IRQ9 used by override.
> 
> So your ACPI table has an override for IRQ2 and routes it to IRQ0.
> 
> Then we initialize HPET, and we _always_ do hpet_enable_legacy_int(),
> which sets the LegacyRouting bit. The HPET document says:
> 
>   If the ENABLE_CNF bit and the LEG_RT_CNF bit are both set, then the
>   interrupts will be routed as follows:
> 
> Timer 0 will be routed to IRQ0 in Non-APIC or IRQ2 in the I/O APIC
> Timer 1 will be routed to IRQ8 in Non-APIC or IRQ8 in the I/O APIC
> Timer 2-n will be routed as per the routing in the timer n config 
> registers.
> 
>   If the LegacyReplacement Route bit is set, the individual routing bits
>   for timers 0 and 1 (APIC or FSB) will have no impact.
> 
> And then we set global_clock_event to _clockevent.
> 
> At this point that _SHOULD_ work afaict, even without actual PIC
> present.
> 
> Sometime after that we call into calibrate_APIC_clock() -- because
> !TSC_DEADLINE -- and this is where you get stuck, because
> global_clock_event is not in fact delivering interrupts.
> 
> Thomas may have more clue, we'll have to wait for him to have a
> time-slot available.

What does your /proc/interrupts look like (on a tsc-deadline boot) ?

0: is the HPET, LOC: is the lapic/tsc-deadline

On my SKL desktop I get 21 PIT/HPET ticks on CPU0 before lapic takes over.


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-19 Thread Peter Zijlstra
On Tue, Dec 19, 2017 at 06:48:24PM +0800, Zhang Rui wrote:
> On Mon, 2017-12-18 at 21:28 +0100, Peter Zijlstra wrote:
> > Hi, can you see if this makes you Surface boot?
> > 
> No, it does not boot.

So I'm confused on the lapic calibration.

That stuff uses global_clock_event, which is initially the i8253 (PIT),
but because !PIC this thing won't be there either on your platform.

Then we initialize I/O APIC, and your machine has:

[0.00] IOAPIC[0]: apic_id 2, version 32, address 0xfec0, GSI 0-119
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[0.00] ACPI: IRQ0 used by override.
[0.00] ACPI: IRQ9 used by override.

So your ACPI table has an override for IRQ2 and routes it to IRQ0.

Then we initialize HPET, and we _always_ do hpet_enable_legacy_int(),
which sets the LegacyRouting bit. The HPET document says:

  If the ENABLE_CNF bit and the LEG_RT_CNF bit are both set, then the
  interrupts will be routed as follows:

Timer 0 will be routed to IRQ0 in Non-APIC or IRQ2 in the I/O APIC
Timer 1 will be routed to IRQ8 in Non-APIC or IRQ8 in the I/O APIC
Timer 2-n will be routed as per the routing in the timer n config registers.

  If the LegacyReplacement Route bit is set, the individual routing bits
  for timers 0 and 1 (APIC or FSB) will have no impact.

And then we set global_clock_event to _clockevent.

At this point that _SHOULD_ work afaict, even without actual PIC
present.

Sometime after that we call into calibrate_APIC_clock() -- because
!TSC_DEADLINE -- and this is where you get stuck, because
global_clock_event is not in fact delivering interrupts.

Thomas may have more clue, we'll have to wait for him to have a
time-slot available.


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-19 Thread Peter Zijlstra
On Tue, Dec 19, 2017 at 06:48:24PM +0800, Zhang Rui wrote:
> On Mon, 2017-12-18 at 21:28 +0100, Peter Zijlstra wrote:
> > Hi, can you see if this makes you Surface boot?
> > 
> No, it does not boot.

So I'm confused on the lapic calibration.

That stuff uses global_clock_event, which is initially the i8253 (PIT),
but because !PIC this thing won't be there either on your platform.

Then we initialize I/O APIC, and your machine has:

[0.00] IOAPIC[0]: apic_id 2, version 32, address 0xfec0, GSI 0-119
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[0.00] ACPI: IRQ0 used by override.
[0.00] ACPI: IRQ9 used by override.

So your ACPI table has an override for IRQ2 and routes it to IRQ0.

Then we initialize HPET, and we _always_ do hpet_enable_legacy_int(),
which sets the LegacyRouting bit. The HPET document says:

  If the ENABLE_CNF bit and the LEG_RT_CNF bit are both set, then the
  interrupts will be routed as follows:

Timer 0 will be routed to IRQ0 in Non-APIC or IRQ2 in the I/O APIC
Timer 1 will be routed to IRQ8 in Non-APIC or IRQ8 in the I/O APIC
Timer 2-n will be routed as per the routing in the timer n config registers.

  If the LegacyReplacement Route bit is set, the individual routing bits
  for timers 0 and 1 (APIC or FSB) will have no impact.

And then we set global_clock_event to _clockevent.

At this point that _SHOULD_ work afaict, even without actual PIC
present.

Sometime after that we call into calibrate_APIC_clock() -- because
!TSC_DEADLINE -- and this is where you get stuck, because
global_clock_event is not in fact delivering interrupts.

Thomas may have more clue, we'll have to wait for him to have a
time-slot available.


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-19 Thread Zhang Rui
On Tue, 2017-12-19 at 14:15 +0100, Peter Zijlstra wrote:
> On Tue, Dec 19, 2017 at 06:48:24PM +0800, Zhang Rui wrote:
> > 
> > On Mon, 2017-12-18 at 21:28 +0100, Peter Zijlstra wrote:
> > > 
> > > Hi, can you see if this makes you Surface boot?
> > > 
> > No, it does not boot.
> Bah, staring at the lapic calibrate now, that is a bit of a mess..
> 
> > 
> > > 
> > > I tested it on my IVB by making has_legacy_pic() return
> > > unconditional
> > > true.
> > > 
> > > [0.024000] tsc: Unable to calibrate against PIT
> > > [0.025000] tsc: using HPET reference calibration
> > > [0.026000] tsc: Detected 2792.451 MHz processor
> > > 
> > > ---
> > > 
> > > 
> > > diff --git a/arch/x86/include/asm/i8259.h
> > > b/arch/x86/include/asm/i8259.h
> > > index c8376b40e882..e2cfc4b52ee4 100644
> > > --- a/arch/x86/include/asm/i8259.h
> > > +++ b/arch/x86/include/asm/i8259.h
> > > @@ -69,6 +69,11 @@ struct legacy_pic {
> > >  extern struct legacy_pic *legacy_pic;
> > >  extern struct legacy_pic null_legacy_pic;
> > >  
> > > +static inline bool has_legacy_pic(void)
> > > +{
> > > + return legacy_pic == _legacy_pic;
> > > +}
> > > +
> > shouldn't this be
> > return legacy_pic == _legacy_pic;
> > ?
> != , but yes, I mess that up.

I see. I have changed that and the platform can not boot neither.

thanks,
rui


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-19 Thread Zhang Rui
On Tue, 2017-12-19 at 14:15 +0100, Peter Zijlstra wrote:
> On Tue, Dec 19, 2017 at 06:48:24PM +0800, Zhang Rui wrote:
> > 
> > On Mon, 2017-12-18 at 21:28 +0100, Peter Zijlstra wrote:
> > > 
> > > Hi, can you see if this makes you Surface boot?
> > > 
> > No, it does not boot.
> Bah, staring at the lapic calibrate now, that is a bit of a mess..
> 
> > 
> > > 
> > > I tested it on my IVB by making has_legacy_pic() return
> > > unconditional
> > > true.
> > > 
> > > [0.024000] tsc: Unable to calibrate against PIT
> > > [0.025000] tsc: using HPET reference calibration
> > > [0.026000] tsc: Detected 2792.451 MHz processor
> > > 
> > > ---
> > > 
> > > 
> > > diff --git a/arch/x86/include/asm/i8259.h
> > > b/arch/x86/include/asm/i8259.h
> > > index c8376b40e882..e2cfc4b52ee4 100644
> > > --- a/arch/x86/include/asm/i8259.h
> > > +++ b/arch/x86/include/asm/i8259.h
> > > @@ -69,6 +69,11 @@ struct legacy_pic {
> > >  extern struct legacy_pic *legacy_pic;
> > >  extern struct legacy_pic null_legacy_pic;
> > >  
> > > +static inline bool has_legacy_pic(void)
> > > +{
> > > + return legacy_pic == _legacy_pic;
> > > +}
> > > +
> > shouldn't this be
> > return legacy_pic == _legacy_pic;
> > ?
> != , but yes, I mess that up.

I see. I have changed that and the platform can not boot neither.

thanks,
rui


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-19 Thread Peter Zijlstra
On Tue, Dec 19, 2017 at 06:48:24PM +0800, Zhang Rui wrote:
> On Mon, 2017-12-18 at 21:28 +0100, Peter Zijlstra wrote:
> > Hi, can you see if this makes you Surface boot?
> > 
> No, it does not boot.

Bah, staring at the lapic calibrate now, that is a bit of a mess..

> > I tested it on my IVB by making has_legacy_pic() return unconditional
> > true.
> > 
> > [0.024000] tsc: Unable to calibrate against PIT
> > [0.025000] tsc: using HPET reference calibration
> > [0.026000] tsc: Detected 2792.451 MHz processor
> > 
> > ---
> > 
> > 
> > diff --git a/arch/x86/include/asm/i8259.h
> > b/arch/x86/include/asm/i8259.h
> > index c8376b40e882..e2cfc4b52ee4 100644
> > --- a/arch/x86/include/asm/i8259.h
> > +++ b/arch/x86/include/asm/i8259.h
> > @@ -69,6 +69,11 @@ struct legacy_pic {
> >  extern struct legacy_pic *legacy_pic;
> >  extern struct legacy_pic null_legacy_pic;
> >  
> > +static inline bool has_legacy_pic(void)
> > +{
> > +   return legacy_pic == _legacy_pic;
> > +}
> > +
> shouldn't this be
>   return legacy_pic == _legacy_pic;
> ?

!= , but yes, I mess that up.


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-19 Thread Peter Zijlstra
On Tue, Dec 19, 2017 at 06:48:24PM +0800, Zhang Rui wrote:
> On Mon, 2017-12-18 at 21:28 +0100, Peter Zijlstra wrote:
> > Hi, can you see if this makes you Surface boot?
> > 
> No, it does not boot.

Bah, staring at the lapic calibrate now, that is a bit of a mess..

> > I tested it on my IVB by making has_legacy_pic() return unconditional
> > true.
> > 
> > [0.024000] tsc: Unable to calibrate against PIT
> > [0.025000] tsc: using HPET reference calibration
> > [0.026000] tsc: Detected 2792.451 MHz processor
> > 
> > ---
> > 
> > 
> > diff --git a/arch/x86/include/asm/i8259.h
> > b/arch/x86/include/asm/i8259.h
> > index c8376b40e882..e2cfc4b52ee4 100644
> > --- a/arch/x86/include/asm/i8259.h
> > +++ b/arch/x86/include/asm/i8259.h
> > @@ -69,6 +69,11 @@ struct legacy_pic {
> >  extern struct legacy_pic *legacy_pic;
> >  extern struct legacy_pic null_legacy_pic;
> >  
> > +static inline bool has_legacy_pic(void)
> > +{
> > +   return legacy_pic == _legacy_pic;
> > +}
> > +
> shouldn't this be
>   return legacy_pic == _legacy_pic;
> ?

!= , but yes, I mess that up.


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-19 Thread Zhang Rui
On Mon, 2017-12-18 at 21:28 +0100, Peter Zijlstra wrote:
> Hi, can you see if this makes you Surface boot?
> 
No, it does not boot.

> I tested it on my IVB by making has_legacy_pic() return unconditional
> true.
> 
> [0.024000] tsc: Unable to calibrate against PIT
> [0.025000] tsc: using HPET reference calibration
> [0.026000] tsc: Detected 2792.451 MHz processor
> 
> ---
> 
> 
> diff --git a/arch/x86/include/asm/i8259.h
> b/arch/x86/include/asm/i8259.h
> index c8376b40e882..e2cfc4b52ee4 100644
> --- a/arch/x86/include/asm/i8259.h
> +++ b/arch/x86/include/asm/i8259.h
> @@ -69,6 +69,11 @@ struct legacy_pic {
>  extern struct legacy_pic *legacy_pic;
>  extern struct legacy_pic null_legacy_pic;
>  
> +static inline bool has_legacy_pic(void)
> +{
> + return legacy_pic == _legacy_pic;
> +}
> +
shouldn't this be
return legacy_pic == _legacy_pic;
?

thanks,
rui
>  static inline int nr_legacy_irqs(void)
>  {
>   return legacy_pic->nr_legacy_irqs;
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 8ea117f8142e..2afc623b2280 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  unsigned int __read_mostly cpu_khz;  /* TSC clocks / usec, not
> used here */
>  EXPORT_SYMBOL(cpu_khz);
> @@ -363,6 +364,15 @@ static unsigned long pit_calibrate_tsc(u32
> latch, unsigned long ms, int loopmin)
>   unsigned long tscmin, tscmax;
>   int pitcnt;
>  
> + if (!has_legacy_pic()) {
> + udelay(10 * USEC_PER_MSEC);
> + udelay(10 * USEC_PER_MSEC);
> + udelay(10 * USEC_PER_MSEC);
> + udelay(10 * USEC_PER_MSEC);
> + udelay(10 * USEC_PER_MSEC);
> + return ULONG_MAX;
> + }
> +
>   /* Set the Gate high, disable speaker */
>   outb((inb(0x61) & ~0x02) | 0x01, 0x61);
>  
> @@ -487,6 +497,9 @@ static unsigned long quick_pit_calibrate(void)
>   u64 tsc, delta;
>   unsigned long d1, d2;
>  
> + if (!has_legacy_pic())
> + return 0;
> +
>   /* Set the Gate high, disable speaker */
>   outb((inb(0x61) & ~0x02) | 0x01, 0x61);
>  


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-19 Thread Zhang Rui
On Mon, 2017-12-18 at 21:28 +0100, Peter Zijlstra wrote:
> Hi, can you see if this makes you Surface boot?
> 
No, it does not boot.

> I tested it on my IVB by making has_legacy_pic() return unconditional
> true.
> 
> [0.024000] tsc: Unable to calibrate against PIT
> [0.025000] tsc: using HPET reference calibration
> [0.026000] tsc: Detected 2792.451 MHz processor
> 
> ---
> 
> 
> diff --git a/arch/x86/include/asm/i8259.h
> b/arch/x86/include/asm/i8259.h
> index c8376b40e882..e2cfc4b52ee4 100644
> --- a/arch/x86/include/asm/i8259.h
> +++ b/arch/x86/include/asm/i8259.h
> @@ -69,6 +69,11 @@ struct legacy_pic {
>  extern struct legacy_pic *legacy_pic;
>  extern struct legacy_pic null_legacy_pic;
>  
> +static inline bool has_legacy_pic(void)
> +{
> + return legacy_pic == _legacy_pic;
> +}
> +
shouldn't this be
return legacy_pic == _legacy_pic;
?

thanks,
rui
>  static inline int nr_legacy_irqs(void)
>  {
>   return legacy_pic->nr_legacy_irqs;
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 8ea117f8142e..2afc623b2280 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  unsigned int __read_mostly cpu_khz;  /* TSC clocks / usec, not
> used here */
>  EXPORT_SYMBOL(cpu_khz);
> @@ -363,6 +364,15 @@ static unsigned long pit_calibrate_tsc(u32
> latch, unsigned long ms, int loopmin)
>   unsigned long tscmin, tscmax;
>   int pitcnt;
>  
> + if (!has_legacy_pic()) {
> + udelay(10 * USEC_PER_MSEC);
> + udelay(10 * USEC_PER_MSEC);
> + udelay(10 * USEC_PER_MSEC);
> + udelay(10 * USEC_PER_MSEC);
> + udelay(10 * USEC_PER_MSEC);
> + return ULONG_MAX;
> + }
> +
>   /* Set the Gate high, disable speaker */
>   outb((inb(0x61) & ~0x02) | 0x01, 0x61);
>  
> @@ -487,6 +497,9 @@ static unsigned long quick_pit_calibrate(void)
>   u64 tsc, delta;
>   unsigned long d1, d2;
>  
> + if (!has_legacy_pic())
> + return 0;
> +
>   /* Set the Gate high, disable speaker */
>   outb((inb(0x61) & ~0x02) | 0x01, 0x61);
>  


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-18 Thread Peter Zijlstra

Hi, can you see if this makes you Surface boot?

I tested it on my IVB by making has_legacy_pic() return unconditional
true.

[0.024000] tsc: Unable to calibrate against PIT
[0.025000] tsc: using HPET reference calibration
[0.026000] tsc: Detected 2792.451 MHz processor

---


diff --git a/arch/x86/include/asm/i8259.h b/arch/x86/include/asm/i8259.h
index c8376b40e882..e2cfc4b52ee4 100644
--- a/arch/x86/include/asm/i8259.h
+++ b/arch/x86/include/asm/i8259.h
@@ -69,6 +69,11 @@ struct legacy_pic {
 extern struct legacy_pic *legacy_pic;
 extern struct legacy_pic null_legacy_pic;
 
+static inline bool has_legacy_pic(void)
+{
+   return legacy_pic == _legacy_pic;
+}
+
 static inline int nr_legacy_irqs(void)
 {
return legacy_pic->nr_legacy_irqs;
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 8ea117f8142e..2afc623b2280 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 unsigned int __read_mostly cpu_khz;/* TSC clocks / usec, not used here */
 EXPORT_SYMBOL(cpu_khz);
@@ -363,6 +364,15 @@ static unsigned long pit_calibrate_tsc(u32 latch, unsigned 
long ms, int loopmin)
unsigned long tscmin, tscmax;
int pitcnt;
 
+   if (!has_legacy_pic()) {
+   udelay(10 * USEC_PER_MSEC);
+   udelay(10 * USEC_PER_MSEC);
+   udelay(10 * USEC_PER_MSEC);
+   udelay(10 * USEC_PER_MSEC);
+   udelay(10 * USEC_PER_MSEC);
+   return ULONG_MAX;
+   }
+
/* Set the Gate high, disable speaker */
outb((inb(0x61) & ~0x02) | 0x01, 0x61);
 
@@ -487,6 +497,9 @@ static unsigned long quick_pit_calibrate(void)
u64 tsc, delta;
unsigned long d1, d2;
 
+   if (!has_legacy_pic())
+   return 0;
+
/* Set the Gate high, disable speaker */
outb((inb(0x61) & ~0x02) | 0x01, 0x61);
 


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-12-18 Thread Peter Zijlstra

Hi, can you see if this makes you Surface boot?

I tested it on my IVB by making has_legacy_pic() return unconditional
true.

[0.024000] tsc: Unable to calibrate against PIT
[0.025000] tsc: using HPET reference calibration
[0.026000] tsc: Detected 2792.451 MHz processor

---


diff --git a/arch/x86/include/asm/i8259.h b/arch/x86/include/asm/i8259.h
index c8376b40e882..e2cfc4b52ee4 100644
--- a/arch/x86/include/asm/i8259.h
+++ b/arch/x86/include/asm/i8259.h
@@ -69,6 +69,11 @@ struct legacy_pic {
 extern struct legacy_pic *legacy_pic;
 extern struct legacy_pic null_legacy_pic;
 
+static inline bool has_legacy_pic(void)
+{
+   return legacy_pic == _legacy_pic;
+}
+
 static inline int nr_legacy_irqs(void)
 {
return legacy_pic->nr_legacy_irqs;
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 8ea117f8142e..2afc623b2280 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 unsigned int __read_mostly cpu_khz;/* TSC clocks / usec, not used here */
 EXPORT_SYMBOL(cpu_khz);
@@ -363,6 +364,15 @@ static unsigned long pit_calibrate_tsc(u32 latch, unsigned 
long ms, int loopmin)
unsigned long tscmin, tscmax;
int pitcnt;
 
+   if (!has_legacy_pic()) {
+   udelay(10 * USEC_PER_MSEC);
+   udelay(10 * USEC_PER_MSEC);
+   udelay(10 * USEC_PER_MSEC);
+   udelay(10 * USEC_PER_MSEC);
+   udelay(10 * USEC_PER_MSEC);
+   return ULONG_MAX;
+   }
+
/* Set the Gate high, disable speaker */
outb((inb(0x61) & ~0x02) | 0x01, 0x61);
 
@@ -487,6 +497,9 @@ static unsigned long quick_pit_calibrate(void)
u64 tsc, delta;
unsigned long d1, d2;
 
+   if (!has_legacy_pic())
+   return 0;
+
/* Set the Gate high, disable speaker */
outb((inb(0x61) & ~0x02) | 0x01, 0x61);
 


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-11-29 Thread Zhang Rui
On Tue, 2017-11-28 at 13:36 +0100, Peter Zijlstra wrote:
> On Tue, Nov 28, 2017 at 06:59:01PM +0800, Zhang Rui wrote:
> > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > My Surface Pro 4 is unable to boot after 4.12. The symptom
> > > > > > is
> > 
> > yes. Tried 4.4 distro and 4.12 vanilla kernel, kernel always
> > freezes
> > with boot option "notscdeadline"/"lapic=notscdeadline".
> Then for some mysterious reason, your Surface thing has a borked
> LAPIC.
> 
I agree. But however, on this platform, forcing a cpu microcode upgrade
after upgrading kernel does not seem like a good solution to users.
Thus I still like to dig into the problem further to see if we can
workaround this "regression" in software.
As the kernel freezes at a very early stage, do you have any advice on
how to narrow down the problem?

thanks,
rui


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-11-29 Thread Zhang Rui
On Tue, 2017-11-28 at 13:36 +0100, Peter Zijlstra wrote:
> On Tue, Nov 28, 2017 at 06:59:01PM +0800, Zhang Rui wrote:
> > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > My Surface Pro 4 is unable to boot after 4.12. The symptom
> > > > > > is
> > 
> > yes. Tried 4.4 distro and 4.12 vanilla kernel, kernel always
> > freezes
> > with boot option "notscdeadline"/"lapic=notscdeadline".
> Then for some mysterious reason, your Surface thing has a borked
> LAPIC.
> 
I agree. But however, on this platform, forcing a cpu microcode upgrade
after upgrading kernel does not seem like a good solution to users.
Thus I still like to dig into the problem further to see if we can
workaround this "regression" in software.
As the kernel freezes at a very early stage, do you have any advice on
how to narrow down the problem?

thanks,
rui


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-11-28 Thread Peter Zijlstra
On Tue, Nov 28, 2017 at 06:59:01PM +0800, Zhang Rui wrote:
> > > > > My Surface Pro 4 is unable to boot after 4.12. The symptom is

> yes. Tried 4.4 distro and 4.12 vanilla kernel, kernel always freezes
> with boot option "notscdeadline"/"lapic=notscdeadline".

Then for some mysterious reason, your Surface thing has a borked LAPIC.

No idea how or why..


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-11-28 Thread Peter Zijlstra
On Tue, Nov 28, 2017 at 06:59:01PM +0800, Zhang Rui wrote:
> > > > > My Surface Pro 4 is unable to boot after 4.12. The symptom is

> yes. Tried 4.4 distro and 4.12 vanilla kernel, kernel always freezes
> with boot option "notscdeadline"/"lapic=notscdeadline".

Then for some mysterious reason, your Surface thing has a borked LAPIC.

No idea how or why..


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-11-28 Thread Zhang Rui
On Tue, 2017-11-28 at 10:34 +0100, Thomas Gleixner wrote:
> On Tue, 28 Nov 2017, Zhang Rui wrote:
> 
> > 
> > On Tue, 2017-11-28 at 09:14 +0100, Peter Zijlstra wrote:
> > > 
> > > On Tue, Nov 28, 2017 at 10:08:53AM +0800, Zhang Rui wrote:
> > > > 
> > > > 
> > > > Hi, All,
> > > > 
> > > > My Surface Pro 4 is unable to boot after 4.12. The symptom is
> > > > that
> > > > kernel freezes during boot, and the last message in the screen
> > > > is
> > > > loading the initrd image. And I have bisected it to this commit
> > > > 
> > > > +// DEADLINE_MODEL_MATCH_REV (
> > > > INTEL_FAM6_SKYLAKE_MOBILE,  
> > > > 0xb2),
> > > And what microcode version do you run? Have you installed the
> > > latest
> > > microcode package and updated your initrd to include it? My
> > > skylake
> > > is
> > > running 0xba.
> > No, I didn't upgrade my microcode.
> > 
> > $ cat /proc/cpuinfo
> > ...
> > processor   : 0
> > vendor_id   : GenuineIntel
> > cpu family  : 6
> > model   : 78
> > model name  : Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz
> > stepping: 3
> > microcode   : 0x9e
> > ...
> > 
> > I suppose the problem should be gone if I upgrade the microcode.
> > But the real problem to me is that the system FREEZES after kernel
> > upgrade.
> Confused. So the match disables the deadline timer due to 0x9e <
> 0xb2. And
> not having deadline timer makes the boot fail despite the fact that
> deadline timer is borked with microcode < 0xb2.
> 
> Can you verify by adding 'lapic=notscdeadline' to the command line of
> a
> 'working' kernel? That should cause a freeze as well then.
> 
yes. Tried 4.4 distro and 4.12 vanilla kernel, kernel always freezes
with boot option "notscdeadline"/"lapic=notscdeadline".

thanks,
rui

> Thanks,
> 
>   tglx


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-11-28 Thread Zhang Rui
On Tue, 2017-11-28 at 10:34 +0100, Thomas Gleixner wrote:
> On Tue, 28 Nov 2017, Zhang Rui wrote:
> 
> > 
> > On Tue, 2017-11-28 at 09:14 +0100, Peter Zijlstra wrote:
> > > 
> > > On Tue, Nov 28, 2017 at 10:08:53AM +0800, Zhang Rui wrote:
> > > > 
> > > > 
> > > > Hi, All,
> > > > 
> > > > My Surface Pro 4 is unable to boot after 4.12. The symptom is
> > > > that
> > > > kernel freezes during boot, and the last message in the screen
> > > > is
> > > > loading the initrd image. And I have bisected it to this commit
> > > > 
> > > > +// DEADLINE_MODEL_MATCH_REV (
> > > > INTEL_FAM6_SKYLAKE_MOBILE,  
> > > > 0xb2),
> > > And what microcode version do you run? Have you installed the
> > > latest
> > > microcode package and updated your initrd to include it? My
> > > skylake
> > > is
> > > running 0xba.
> > No, I didn't upgrade my microcode.
> > 
> > $ cat /proc/cpuinfo
> > ...
> > processor   : 0
> > vendor_id   : GenuineIntel
> > cpu family  : 6
> > model   : 78
> > model name  : Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz
> > stepping: 3
> > microcode   : 0x9e
> > ...
> > 
> > I suppose the problem should be gone if I upgrade the microcode.
> > But the real problem to me is that the system FREEZES after kernel
> > upgrade.
> Confused. So the match disables the deadline timer due to 0x9e <
> 0xb2. And
> not having deadline timer makes the boot fail despite the fact that
> deadline timer is borked with microcode < 0xb2.
> 
> Can you verify by adding 'lapic=notscdeadline' to the command line of
> a
> 'working' kernel? That should cause a freeze as well then.
> 
yes. Tried 4.4 distro and 4.12 vanilla kernel, kernel always freezes
with boot option "notscdeadline"/"lapic=notscdeadline".

thanks,
rui

> Thanks,
> 
>   tglx


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-11-28 Thread Peter Zijlstra
On Tue, Nov 28, 2017 at 04:22:15PM +0800, Zhang Rui wrote:
> On Tue, 2017-11-28 at 09:14 +0100, Peter Zijlstra wrote:
> > On Tue, Nov 28, 2017 at 10:08:53AM +0800, Zhang Rui wrote:
> > > 
> > > Hi, All,
> > > 
> > > My Surface Pro 4 is unable to boot after 4.12. The symptom is that
> > > kernel freezes during boot, and the last message in the screen is
> > > loading the initrd image. And I have bisected it to this commit
> > > 
> > > +//   DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_SKYLAKE_MOBILE,   
> > > 0xb2),
> > And what microcode version do you run? Have you installed the latest
> > microcode package and updated your initrd to include it? My skylake
> > is
> > running 0xba.
> 
> No, I didn't upgrade my microcode.
> 
> $ cat /proc/cpuinfo
> ...
> processor : 0
> vendor_id : GenuineIntel
> cpu family: 6
> model : 78
> model name: Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz
> stepping  : 3
> microcode : 0x9e
> ...
> 
> I suppose the problem should be gone if I upgrade the microcode.
> But the real problem to me is that the system FREEZES after kernel
> upgrade.

Yes, that is weird and unexpected. The patch basically kills
TSC_DEADLINE usage on older microcode. This makes the kernel then use
the old LAPIC timer mode.

Does your machine boot with "notscdeadline" ? Which _should_ be the
same.


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-11-28 Thread Peter Zijlstra
On Tue, Nov 28, 2017 at 04:22:15PM +0800, Zhang Rui wrote:
> On Tue, 2017-11-28 at 09:14 +0100, Peter Zijlstra wrote:
> > On Tue, Nov 28, 2017 at 10:08:53AM +0800, Zhang Rui wrote:
> > > 
> > > Hi, All,
> > > 
> > > My Surface Pro 4 is unable to boot after 4.12. The symptom is that
> > > kernel freezes during boot, and the last message in the screen is
> > > loading the initrd image. And I have bisected it to this commit
> > > 
> > > +//   DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_SKYLAKE_MOBILE,   
> > > 0xb2),
> > And what microcode version do you run? Have you installed the latest
> > microcode package and updated your initrd to include it? My skylake
> > is
> > running 0xba.
> 
> No, I didn't upgrade my microcode.
> 
> $ cat /proc/cpuinfo
> ...
> processor : 0
> vendor_id : GenuineIntel
> cpu family: 6
> model : 78
> model name: Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz
> stepping  : 3
> microcode : 0x9e
> ...
> 
> I suppose the problem should be gone if I upgrade the microcode.
> But the real problem to me is that the system FREEZES after kernel
> upgrade.

Yes, that is weird and unexpected. The patch basically kills
TSC_DEADLINE usage on older microcode. This makes the kernel then use
the old LAPIC timer mode.

Does your machine boot with "notscdeadline" ? Which _should_ be the
same.


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-11-28 Thread Thomas Gleixner
On Tue, 28 Nov 2017, Zhang Rui wrote:

> On Tue, 2017-11-28 at 09:14 +0100, Peter Zijlstra wrote:
> > On Tue, Nov 28, 2017 at 10:08:53AM +0800, Zhang Rui wrote:
> > > 
> > > Hi, All,
> > > 
> > > My Surface Pro 4 is unable to boot after 4.12. The symptom is that
> > > kernel freezes during boot, and the last message in the screen is
> > > loading the initrd image. And I have bisected it to this commit
> > > 
> > > +//   DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_SKYLAKE_MOBILE,   
> > > 0xb2),
> > And what microcode version do you run? Have you installed the latest
> > microcode package and updated your initrd to include it? My skylake
> > is
> > running 0xba.
> 
> No, I didn't upgrade my microcode.
> 
> $ cat /proc/cpuinfo
> ...
> processor : 0
> vendor_id : GenuineIntel
> cpu family: 6
> model : 78
> model name: Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz
> stepping  : 3
> microcode : 0x9e
> ...
> 
> I suppose the problem should be gone if I upgrade the microcode.
> But the real problem to me is that the system FREEZES after kernel
> upgrade.

Confused. So the match disables the deadline timer due to 0x9e < 0xb2. And
not having deadline timer makes the boot fail despite the fact that
deadline timer is borked with microcode < 0xb2.

Can you verify by adding 'lapic=notscdeadline' to the command line of a
'working' kernel? That should cause a freeze as well then.

Thanks,

tglx


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-11-28 Thread Thomas Gleixner
On Tue, 28 Nov 2017, Zhang Rui wrote:

> On Tue, 2017-11-28 at 09:14 +0100, Peter Zijlstra wrote:
> > On Tue, Nov 28, 2017 at 10:08:53AM +0800, Zhang Rui wrote:
> > > 
> > > Hi, All,
> > > 
> > > My Surface Pro 4 is unable to boot after 4.12. The symptom is that
> > > kernel freezes during boot, and the last message in the screen is
> > > loading the initrd image. And I have bisected it to this commit
> > > 
> > > +//   DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_SKYLAKE_MOBILE,   
> > > 0xb2),
> > And what microcode version do you run? Have you installed the latest
> > microcode package and updated your initrd to include it? My skylake
> > is
> > running 0xba.
> 
> No, I didn't upgrade my microcode.
> 
> $ cat /proc/cpuinfo
> ...
> processor : 0
> vendor_id : GenuineIntel
> cpu family: 6
> model : 78
> model name: Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz
> stepping  : 3
> microcode : 0x9e
> ...
> 
> I suppose the problem should be gone if I upgrade the microcode.
> But the real problem to me is that the system FREEZES after kernel
> upgrade.

Confused. So the match disables the deadline timer due to 0x9e < 0xb2. And
not having deadline timer makes the boot fail despite the fact that
deadline timer is borked with microcode < 0xb2.

Can you verify by adding 'lapic=notscdeadline' to the command line of a
'working' kernel? That should cause a freeze as well then.

Thanks,

tglx


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-11-28 Thread Zhang Rui
On Tue, 2017-11-28 at 09:14 +0100, Peter Zijlstra wrote:
> On Tue, Nov 28, 2017 at 10:08:53AM +0800, Zhang Rui wrote:
> > 
> > Hi, All,
> > 
> > My Surface Pro 4 is unable to boot after 4.12. The symptom is that
> > kernel freezes during boot, and the last message in the screen is
> > loading the initrd image. And I have bisected it to this commit
> > 
> > +// DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_SKYLAKE_MOBILE,   
> > 0xb2),
> And what microcode version do you run? Have you installed the latest
> microcode package and updated your initrd to include it? My skylake
> is
> running 0xba.

No, I didn't upgrade my microcode.

$ cat /proc/cpuinfo
...
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 78
model name  : Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz
stepping: 3
microcode   : 0x9e
...

I suppose the problem should be gone if I upgrade the microcode.
But the real problem to me is that the system FREEZES after kernel
upgrade.

thanks,
rui


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-11-28 Thread Zhang Rui
On Tue, 2017-11-28 at 09:14 +0100, Peter Zijlstra wrote:
> On Tue, Nov 28, 2017 at 10:08:53AM +0800, Zhang Rui wrote:
> > 
> > Hi, All,
> > 
> > My Surface Pro 4 is unable to boot after 4.12. The symptom is that
> > kernel freezes during boot, and the last message in the screen is
> > loading the initrd image. And I have bisected it to this commit
> > 
> > +// DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_SKYLAKE_MOBILE,   
> > 0xb2),
> And what microcode version do you run? Have you installed the latest
> microcode package and updated your initrd to include it? My skylake
> is
> running 0xba.

No, I didn't upgrade my microcode.

$ cat /proc/cpuinfo
...
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 78
model name  : Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz
stepping: 3
microcode   : 0x9e
...

I suppose the problem should be gone if I upgrade the microcode.
But the real problem to me is that the system FREEZES after kernel
upgrade.

thanks,
rui


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-11-28 Thread Peter Zijlstra
On Tue, Nov 28, 2017 at 10:08:53AM +0800, Zhang Rui wrote:
> Hi, All,
> 
> My Surface Pro 4 is unable to boot after 4.12. The symptom is that
> kernel freezes during boot, and the last message in the screen is
> loading the initrd image. And I have bisected it to this commit

> +//   DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_SKYLAKE_MOBILE,   0xb2),

And what microcode version do you run? Have you installed the latest
microcode package and updated your initrd to include it? My skylake is
running 0xba.


Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL

2017-11-28 Thread Peter Zijlstra
On Tue, Nov 28, 2017 at 10:08:53AM +0800, Zhang Rui wrote:
> Hi, All,
> 
> My Surface Pro 4 is unable to boot after 4.12. The symptom is that
> kernel freezes during boot, and the last message in the screen is
> loading the initrd image. And I have bisected it to this commit

> +//   DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_SKYLAKE_MOBILE,   0xb2),

And what microcode version do you run? Have you installed the latest
microcode package and updated your initrd to include it? My skylake is
running 0xba.