Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-03-04 Thread Ni, Ray
Michael,
do you have any updated patch?

Thanks,
Ray

From: Michael Brown 
Sent: Friday, March 1, 2024 19:10
To: Paolo Bonzini 
Cc: Ni, Ray ; devel@edk2.groups.io ; 
Kinney, Michael D ; Liming Gao 
; Laszlo Ersek 
Subject: Re: [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to 
nested interrupts

On 01/03/2024 09:33, Paolo Bonzini wrote:
> On Fri, Mar 1, 2024 at 10:27 AM Michael Brown  wrote:
>> It's possible that it doesn't matter.  The new logic will effectively
>> mean that RestoreTPL() will restore not only the TPL but also the
>> interrupts-enabled state to whatever existed at the time of the
>> corresponding RaiseTPL().
>
> Right: that's what my comment says
>
> +  // However, when the handler calls RestoreTPL
> +  // before returning, we want to keep interrupts disabled.  This
> +  // restores the exact state at the beginning of the handler,
> +  // before the call to RaiseTPL(): low TPL and interrupts disabled.
>
> but indeed it applies beyond interrupt handlers. It might even be a bugfix.

Right.  I'm leaning towards treating this as a bugfix: essentially
tightening up the semantics of RestoreTPL() to mean:

- any callbacks in the range OldTpl < Tpl < gEfiCurrentTpl will be
dispatched with interrupts unconditionally enabled

- the TPL will be restored to OldTpl

- the interrupt state will be restored to the value it had when the TPL
was last raised from OldTpl

It feels as though this should be able to be cleanly modelled with a
single global state array

   BOOLEAN mSavedInterruptState[TPL_HIGH_LEVEL]

(or possibly a bitmask, though using the array avoids having to disable
interrupts just to write a value).

I still need to think through the subtleties, to make sure it could cope
with pathological edge cases such as

   OldTpl = gBS->RaiseTPL (TPL_HIGH_LEVEL);

   ...

   gBS->RestoreTPL (OldTpl);
   gBS->RestoreTPL (OldTpl);

or

   OldTpl = gBS->RaiseTPL (TPL_HIGH_LEVEL - 1);
   gBS->RaiseTPL (TPL_HIGH_LEVEL);

   ..

   gBS->RestoreTPL (OldTpl);

I think that at least one of the above pathological usage patterns would
break the existing mInterruptedTplMask patches, since they currently
clear state in RestoreTPL() and so will not correctly handle a duplicate
call to RestoreTPL().

I'll try to get a patch put together over the weekend.

Thanks,

Michael



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116352): https://edk2.groups.io/g/devel/message/116352
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-03-01 Thread Paolo Bonzini
Il ven 1 mar 2024, 12:10 Michael Brown  ha scritto:

> It feels as though this should be able to be cleanly modelled with a
> single global state array
>
>BOOLEAN mSavedInterruptState[TPL_HIGH_LEVEL]
>

Pretty much, yes. But you only have to write it when the interrupts are
already disabled, so the bitmask works and makes it easier to clear "all
values at NewTpl and above" with just an AND.

>
> (or possibly a bitmask, though using the array avoids having to disable
> interrupts just to write a value).
>
> I still need to think through the subtleties, to make sure it could cope
> with pathological edge cases such as
>
>OldTpl = gBS->RaiseTPL (TPL_HIGH_LEVEL);
>
>...
>
>gBS->RestoreTPL (OldTpl);
>gBS->RestoreTPL (OldTpl);
>
> or
>
>OldTpl = gBS->RaiseTPL (TPL_HIGH_LEVEL - 1);
>gBS->RaiseTPL (TPL_HIGH_LEVEL);
>
>..
>
>gBS->RestoreTPL (OldTpl);
>
> I think that at least one of the above pathological usage patterns would
> break the existing mInterruptedTplMask patches, since they currently
> clear state in RestoreTPL() and so will not correctly handle a duplicate
> call to RestoreTPL().
>

I think my patch works (modulo the 1 vs. 1U issue) for the second.
Declaring the first to be invalid is a good idea IMO. Also it would only
break in interrupt handlers and would revert to just causing a stack
overflow in the interrupt storm scenario, so it wouldn't be too bad...

Paolo


> I'll try to get a patch put together over the weekend.
>
> Thanks,
>
> Michael
>
>


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116238): https://edk2.groups.io/g/devel/message/116238
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-03-01 Thread Michael Brown

On 01/03/2024 09:33, Paolo Bonzini wrote:

On Fri, Mar 1, 2024 at 10:27 AM Michael Brown  wrote:

It's possible that it doesn't matter.  The new logic will effectively
mean that RestoreTPL() will restore not only the TPL but also the
interrupts-enabled state to whatever existed at the time of the
corresponding RaiseTPL().


Right: that's what my comment says

+  // However, when the handler calls RestoreTPL
+  // before returning, we want to keep interrupts disabled.  This
+  // restores the exact state at the beginning of the handler,
+  // before the call to RaiseTPL(): low TPL and interrupts disabled.

but indeed it applies beyond interrupt handlers. It might even be a bugfix.


Right.  I'm leaning towards treating this as a bugfix: essentially 
tightening up the semantics of RestoreTPL() to mean:


- any callbacks in the range OldTpl < Tpl < gEfiCurrentTpl will be 
dispatched with interrupts unconditionally enabled


- the TPL will be restored to OldTpl

- the interrupt state will be restored to the value it had when the TPL 
was last raised from OldTpl


It feels as though this should be able to be cleanly modelled with a 
single global state array


  BOOLEAN mSavedInterruptState[TPL_HIGH_LEVEL]

(or possibly a bitmask, though using the array avoids having to disable 
interrupts just to write a value).


I still need to think through the subtleties, to make sure it could cope 
with pathological edge cases such as


  OldTpl = gBS->RaiseTPL (TPL_HIGH_LEVEL);

  ...

  gBS->RestoreTPL (OldTpl);
  gBS->RestoreTPL (OldTpl);

or

  OldTpl = gBS->RaiseTPL (TPL_HIGH_LEVEL - 1);
  gBS->RaiseTPL (TPL_HIGH_LEVEL);

  ..

  gBS->RestoreTPL (OldTpl);

I think that at least one of the above pathological usage patterns would 
break the existing mInterruptedTplMask patches, since they currently 
clear state in RestoreTPL() and so will not correctly handle a duplicate 
call to RestoreTPL().


I'll try to get a patch put together over the weekend.

Thanks,

Michael



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116232): https://edk2.groups.io/g/devel/message/116232
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-03-01 Thread Paolo Bonzini
On Fri, Mar 1, 2024 at 10:27 AM Michael Brown  wrote:
> > My version is attached, feel free to reuse it (either entirely or
> > partially) for a hypothetical v2. Apologies to you and Mike K for the
> > confusion!
>
> I much prefer this version of the patch, because the explanations are
> easier to follow and to reason about.

Thanks! I find the new logic to be quite appealing... now that I
understand it. Hopefully this provides a way to find the best of both
worlds.

> There is an implicit assumption that if interrupts are disabled when
> RaiseTPL() is called then we must have been called from an interrupt
> handler.  How sure are we that this assumption is correct?
>
> It's possible that it doesn't matter.  The new logic will effectively
> mean that RestoreTPL() will restore not only the TPL but also the
> interrupts-enabled state to whatever existed at the time of the
> corresponding RaiseTPL().

Right: that's what my comment says

+  // However, when the handler calls RestoreTPL
+  // before returning, we want to keep interrupts disabled.  This
+  // restores the exact state at the beginning of the handler,
+  // before the call to RaiseTPL(): low TPL and interrupts disabled.

but indeed it applies beyond interrupt handlers. It might even be a bugfix.

> Maybe this is what we want?  If so, then we
> should probably phrase the comments in these terms instead of in terms
> of being called from an interrupt handler.

I think phrasing the comments with reference to interrupt handlers is
fine, but it may be worth adding a comment to either
mInterruptedTplMask or CoreRestoreTpl(), like

+/// NOTE: Strictly speaking, this applies to anything that
+/// calls RaiseTPL() with interrupts disabled, not just
+/// interrupt handlers.  Interrupt handlers are just the case
+/// that we care the most about, because of the potential
+/// for stack overflow.

Paolo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116228): https://edk2.groups.io/g/devel/message/116228
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-03-01 Thread Michael Brown

On 01/03/2024 08:37, Paolo Bonzini wrote:

So I am starting to see more and more similarities between the two
approaches.  I went a step further with fresh mind, removing the while
loop... and basically reinvented your and Michael's patch. :) The only
difference in the logic is a slightly different handling of
mInterruptedTplMask in CoreRestoreTpl(), which is a bit safer in my
case.

However, my roundabout way of getting to the same patch resulted in
very different comments. Personally, I found the large text at the
head of mInterruptedTplMask a bit too much, and the ones inside the
function too focused on "how" and not "why". Maybe it's my exposure to
NestedInterruptTplLib, but I find that a much smaller text can achieve
the same purpose, by explaining the logic instead of the individual
steps.

My version is attached, feel free to reuse it (either entirely or
partially) for a hypothetical v2. Apologies to you and Mike K for the
confusion!


I much prefer this version of the patch, because the explanations are 
easier to follow and to reason about.



Minor point (applies to both your and Ray's versions):

- The use of gCpu->GetInterruptState() vs CoreSetInterruptState() is 
inconsistent.  It feels as though CoreGetInterruptState() ought to 
exist.  I assume that the PI spec defines whether interrupts are enabled 
or disabled prior to the CPU arch protocol being installed, so it should 
be possible to have a well-defined return value from 
CoreGetInterruptState().



Major point:

There is an implicit assumption that if interrupts are disabled when 
RaiseTPL() is called then we must have been called from an interrupt 
handler.  How sure are we that this assumption is correct?


It's possible that it doesn't matter.  The new logic will effectively 
mean that RestoreTPL() will restore not only the TPL but also the 
interrupts-enabled state to whatever existed at the time of the 
corresponding RaiseTPL().  Maybe this is what we want?  If so, then we 
should probably phrase the comments in these terms instead of in terms 
of being called from an interrupt handler.



Thanks,

Michael



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116227): https://edk2.groups.io/g/devel/message/116227
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-03-01 Thread Ni, Ray
Paolo,
Happy weekends!
Thanks! I will read it on my next Monday.

Thanks,
Ray
> -Original Message-
> From: Paolo Bonzini 
> Sent: Friday, March 1, 2024 4:44 PM
> To: Ni, Ray 
> Cc: devel@edk2.groups.io; Kinney, Michael D ;
> Liming Gao ; Laszlo Ersek ;
> Michael Brown 
> Subject: Re: [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue
> due to nested interrupts
> 
> One fix is needed in the code.
> 
> On Thu, Feb 29, 2024 at 2:04 PM Ray Ni  wrote:
> > +  //
> > +  // Save the "Interrupted TPL" (TPL that was interrupted).
> > +  //
> > +  mInterruptedTplMask |= (UINTN)(1 << gEfiCurrentTpl);
> > +}
> >}
> 
> > +  //
> > +  // Clear interrupted TPL level mask, but do not re-enable interrupts
> here
> > +  // This will return to CoreTimerTick() and interrupts will be
> re-enabled
> > +  // when the timer interrupt handlers returns from interrupt
> context.
> > +  //
> > +  ASSERT ((INTN)gEfiCurrentTpl == HighBitSet64
> (mInterruptedTplMask));
> > +  mInterruptedTplMask &= ~(UINTN)(1 << gEfiCurrentTpl);
> > +}
> >}
> 
> Both of these need to use "1U" to avoid sign extending bit 31 into bits 
> 31..63.
> 
> The same issue is (in three places) present in my own version of the patch. :(
> 
> Paolo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116226): https://edk2.groups.io/g/devel/message/116226
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-03-01 Thread Paolo Bonzini
One fix is needed in the code.

On Thu, Feb 29, 2024 at 2:04 PM Ray Ni  wrote:
> +  //
> +  // Save the "Interrupted TPL" (TPL that was interrupted).
> +  //
> +  mInterruptedTplMask |= (UINTN)(1 << gEfiCurrentTpl);
> +}
>}

> +  //
> +  // Clear interrupted TPL level mask, but do not re-enable interrupts 
> here
> +  // This will return to CoreTimerTick() and interrupts will be 
> re-enabled
> +  // when the timer interrupt handlers returns from interrupt context.
> +  //
> +  ASSERT ((INTN)gEfiCurrentTpl == HighBitSet64 (mInterruptedTplMask));
> +  mInterruptedTplMask &= ~(UINTN)(1 << gEfiCurrentTpl);
> +}
>}

Both of these need to use "1U" to avoid sign extending bit 31 into bits 31..63.

The same issue is (in three places) present in my own version of the patch. :(

Paolo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116225): https://edk2.groups.io/g/devel/message/116225
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-03-01 Thread Paolo Bonzini
On Fri, Mar 1, 2024 at 4:08 AM Ni, Ray  wrote:
> @@ -161,5 +191,46 @@ CoreRestoreTpl (
>IN EFI_TPL NewTpl
>)
>  {
> +  BOOLEAN InInterruptHandler = FALSE;
> +
> +  //
> +  // Unwind the nested interrupt handlers up to the required
> +  // TPL, paying attention not to overflow the stack.  While
> +  // not strictly necessary according to the specification,
> +  // accept the possibility that multiple RaiseTPL calls are
> +  // undone by a single RestoreTPL
> +  //
> +  while ((INTN)NewTpl <= HighBitSet64 (mInterruptedTplMask)) {
> 1. why "<="? I thought when RestoreTPL() is called there are only two cases:
>a. NewTpl == HighBitSet64 (...)
>b. NewTpl > HighBitSet64 (...)
>   1.a is the case when TimerInterruptHandler() or CoreTimerTick() restores
>   TPL from HIGH to non-HIGH.
>   1.b is the case when the pending event backs call RaiseTPL/RestoreTPL().
>   Because only pending events whose TPL > "Interrupted TPL" can run, the
>   RestoreTPL() call from the event callbacks cannot change the TPL to a value
>   less than or equal to "Interrupted TPL".
>   So, I think "<=" can be "==".
>
> 2. can you explain a bit more about the reason of "while"?

Both are just for extra safety. The required invariant is that all
bits at or below current TPL are cleared, and using "while (... <=
...)" makes it more robust to incorrect usage of gBS->RestoreTPL().

Indeed, the patch at the top of thread also uses "(INTN)gEfiCurrentTpl
> HighBitSet64 (mInterruptedTplMask)", which is <= when you reverse
the condition.  It then asserts inside the conditional that "==" would
be enough.

So I am starting to see more and more similarities between the two
approaches.  I went a step further with fresh mind, removing the while
loop... and basically reinvented your and Michael's patch. :) The only
difference in the logic is a slightly different handling of
mInterruptedTplMask in CoreRestoreTpl(), which is a bit safer in my
case.

However, my roundabout way of getting to the same patch resulted in
very different comments. Personally, I found the large text at the
head of mInterruptedTplMask a bit too much, and the ones inside the
function too focused on "how" and not "why". Maybe it's my exposure to
NestedInterruptTplLib, but I find that a much smaller text can achieve
the same purpose, by explaining the logic instead of the individual
steps.

My version is attached, feel free to reuse it (either entirely or
partially) for a hypothetical v2. Apologies to you and Mike K for the
confusion!

> +
> +if (InterruptedTpl == NewTpl) {
> +  break;
> 3. "break" or "return"? I think we should exit from this function.

Indeed, this should have been a return.

Paolo


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116224): https://edk2.groups.io/g/devel/message/116224
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


From b9f0bc3ef83b40c29e093acda1d0741b8f5610e5 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini 
Date: Fri, 1 Mar 2024 09:11:48 +0100
Subject: [PATCH] MdeModulePkg: fix stack overflow issue due to nested
 interrupts
Content-Type: text/plain; charset=UTF-8

This is a heavily simplified version of the fix in the OvmfPkg
NestedInterruptTplLib.  Putting it in DXE core allows CoreRestoreTpl()
to lower the TPL while keeping interrupts disabled, removing the need
for either DisableInterruptsOnIret() or the complex deferred execution
mechanism.

Instead, CoreRaiseTpl() uses the current state of the interrupt flag to
second guess whether it's being called from an interrupt handler; when
restoring the outer TPL at the end of the handler, interrupts remain
disabled until IRET.  This eliminates the possibility that a nested
invocation of the interrupt handler has the same TPL as the outer one.

Signed-off-by: Michael D Kinney 
Signed-off-by: Ray Ni 
[Rewrote the CoreRestoreTpl part and the commit message. - Paolo]
Signed-off-by: Paolo Bonzini 
---
 MdeModulePkg/Core/Dxe/Event/Tpl.c | 85 ++-
 1 file changed, 72 insertions(+), 13 deletions(-)

diff --git a/MdeModulePkg/Core/Dxe/Event/Tpl.c b/MdeModulePkg/Core/Dxe/Event/Tpl.c
index b33f80573c..0a4f99521c 100644
--- a/MdeModulePkg/Core/Dxe/Event/Tpl.c
+++ b/MdeModulePkg/Core/Dxe/Event/Tpl.c
@@ -9,6 +9,16 @@ SPDX-License-Identifier: BSD-2-Clause-Patent
 #include "DxeMain.h"
 #include "Event.h"
 
+///
+/// Bit mask of TPLs that were interrupted (typically during RestoreTPL's
+/// event dispatching, though there are reports that the Windows boot loader
+/// executes stray STIs at TPL_HIGH_LEVEL).  CoreRaiseTpl() sets the
+/// OldTpl-th bit when it detects it was called from and interrupt handler,
+/// because the corresponding CoreRestoreTpl() needs different semantics for
+/// the CPU interrupt state.  See CoreRaiseTpl() and CoreRestoreTpl() below.
+///

Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-02-29 Thread Ni, Ray
I think we are all aligned on the purpose. It's to avoid enabling the 
interrupts in the end of RestoreTPL (HIGH->non-HIGH) in the interrupt context.
The discussion is about how to implement it.

Michael Brown's idea is to avoid changing DxeCore but add a customized 
RaiseTpl/RestoreTpl implementation in a lib and request Timer driver calls it.
That lib was implemented very smartly. It includes while-loop, 
implicitly-recursive, implicitly-requiring NESTED_INTERRUPT_STATE in global 
storage not in stack as local variable.
I really do NOT like the future that every timer driver calls that lib to avoid 
the potential stack overflow. It's so complicated! And it's called in every 
10ms!!

Paolo,
I don't fully understand your patch especially the following changes.
3 comments embedded.

@@ -161,5 +191,46 @@ CoreRestoreTpl (
   IN EFI_TPL NewTpl
   )
 {
+  BOOLEAN InInterruptHandler = FALSE;
+
+  //
+  // Unwind the nested interrupt handlers up to the required
+  // TPL, paying attention not to overflow the stack.  While
+  // not strictly necessary according to the specification,
+  // accept the possibility that multiple RaiseTPL calls are
+  // undone by a single RestoreTPL
+  //
+  while ((INTN)NewTpl <= HighBitSet64 (mInterruptedTplMask)) {
1. why "<="? I thought when RestoreTPL() is called there are only two cases:
   a. NewTpl == HighBitSet64 (...)
   b. NewTpl > HighBitSet64 (...)
  1.a is the case when TimerInterruptHandler() or CoreTimerTick() restores
  TPL from HIGH to non-HIGH.
  1.b is the case when the pending event backs call RaiseTPL/RestoreTPL().
  Because only pending events whose TPL > "Interrupted TPL" can run, the
  RestoreTPL() call from the event callbacks cannot change the TPL to a value
  less than or equal to "Interrupted TPL".
  So, I think "<=" can be "==".

2. can you explain a bit more about the reason of "while"?



+UINTN InterruptedTpl = HighBitSet64 (mInterruptedTplMask);
+mInterruptedTplMask &= ~(UINTN)(1 << InterruptedTpl);
+
+ASSERT (GetInterruptState () == FALSE);
+InInterruptHandler = TRUE;
+
+//
+// Take the TPL down a notch to allow event notifications to be
+// dispatched.  This will implicitly re-enable interrupts (if
+// InterruptedTPL is below TPL_HIGH_LEVEL), even though we are
+// still inside the interrupt handler, but the new TPL will
+// be set while they are disabled.
+//
+// DesiredInterruptState must be FALSE to ensure that the
+// stack does not blow up.  If we used, as in the final call
+// below, "InterruptedTpl < TPL_HIGH_LEVEL", the timer interrupt
+// handler could be invoked repeatedly in the small window between
+// CoreSetInterruptState (TRUE) and the IRET instruction.
+//
+CoreRestoreTplInternal (InterruptedTpl, FALSE);
+
+if (InterruptedTpl == NewTpl) {
+  break;
3. "break" or "return"? I think we should exit from this function.


+}
+  }
+
+  //
+  // If we get here with InInterruptHandler == TRUE, an interrupt
+  // handler forgot to restore the TPL.
+  //
+  ASSERT (!InInterruptHandler);
   CoreRestoreTplInternal (NewTpl, NewTpl < TPL_HIGH_LEVEL);
 }

Thanks,
Ray
> -Original Message-
> From: Paolo Bonzini 
> Sent: Friday, March 1, 2024 8:14 AM
> To: Ni, Ray 
> Cc: devel@edk2.groups.io; Kinney, Michael D ;
> Liming Gao ; Laszlo Ersek ;
> Michael Brown 
> Subject: Re: [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue
> due to nested interrupts
> 
> On Thu, Feb 29, 2024 at 2:04 PM Ray Ni  wrote:
> > @@ -134,9 +262,9 @@ CoreRestoreTpl (
> >}
> >
> >//
> > -  // Set the new value
> > +  // Set the new TPL with interrupt disabled.
> >//
> > -
> > +  CoreSetInterruptState (FALSE);
> >gEfiCurrentTpl = NewTpl;
> >
> >//
> > @@ -144,7 +272,22 @@ CoreRestoreTpl (
> >// interrupts are enabled
> >//
> >if (gEfiCurrentTpl < TPL_HIGH_LEVEL) {
> > -CoreSetInterruptState (TRUE);
> > +if ((INTN)gEfiCurrentTpl > HighBitSet64 (mInterruptedTplMask)) {
> > +  //
> > +  // Only enable interrupts if restoring to a level above the highest
> > +  // interrupted TPL level.  This allows interrupt nesting, but only 
> > for
> > +  // events at higher TPL level than the current TPL level.
> > +  //
> > +  CoreSetInterruptState (TRUE);
> > +} else {
> > +  //
> > +  // Clear interrupted TPL level mask, but do not re-enable interrupts
> here
> > +  // This will return to CoreTimerTick() and interrupts will be
> re-enabled
> > +  // when the timer interrupt handlers returns from interrupt
> context.
> > +  //
> > +  ASSERT ((INTN)gEfiCurrentTpl == HighBitSet64
> (mInterruptedTplMask));
> > +  mInterruptedTplMask &= ~(UINTN)(1 << gEfiCurrentTpl);
> > +}
> >}
> 
> Ok, now I understand what's going on and it's indeed the same logic as
> NestedInterruptTplLib, with DisableInterruptsOnIret() replaced by
> skipping CoreSetInterruptState(TRUE). It's similar to what I proposed
> 

Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-02-29 Thread Paolo Bonzini
On Thu, Feb 29, 2024 at 2:04 PM Ray Ni  wrote:
> @@ -134,9 +262,9 @@ CoreRestoreTpl (
>}
>
>//
> -  // Set the new value
> +  // Set the new TPL with interrupt disabled.
>//
> -
> +  CoreSetInterruptState (FALSE);
>gEfiCurrentTpl = NewTpl;
>
>//
> @@ -144,7 +272,22 @@ CoreRestoreTpl (
>// interrupts are enabled
>//
>if (gEfiCurrentTpl < TPL_HIGH_LEVEL) {
> -CoreSetInterruptState (TRUE);
> +if ((INTN)gEfiCurrentTpl > HighBitSet64 (mInterruptedTplMask)) {
> +  //
> +  // Only enable interrupts if restoring to a level above the highest
> +  // interrupted TPL level.  This allows interrupt nesting, but only for
> +  // events at higher TPL level than the current TPL level.
> +  //
> +  CoreSetInterruptState (TRUE);
> +} else {
> +  //
> +  // Clear interrupted TPL level mask, but do not re-enable interrupts 
> here
> +  // This will return to CoreTimerTick() and interrupts will be 
> re-enabled
> +  // when the timer interrupt handlers returns from interrupt context.
> +  //
> +  ASSERT ((INTN)gEfiCurrentTpl == HighBitSet64 (mInterruptedTplMask));
> +  mInterruptedTplMask &= ~(UINTN)(1 << gEfiCurrentTpl);
> +}
>}

Ok, now I understand what's going on and it's indeed the same logic as
NestedInterruptTplLib, with DisableInterruptsOnIret() replaced by
skipping CoreSetInterruptState(TRUE). It's similar to what I proposed
elsewhere in the thread, just written differenty.

I agree with Michael Brown that the spec is unclear on the state of
the interrupt flag on exit from gBS->RestoreTPL(), but perhaps this
change is feasible if the interrupt handlers just raise the TPL first
and restore it last.

Just as an exercise for me to understand the code better, I tried
rewriting the code in terms of the CoreRestoreTplInternal() function
that I proposed. I find it easier to read, but I guess that's a bit in
the eye of the beholder, and it is a little more defensively coded. I
attach it (untested beyond compilation) for reference.

Paolo


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116192): https://edk2.groups.io/g/devel/message/116192
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


From 23c4f60cf79f29ab5eff55a02c72bb504804d02a Mon Sep 17 00:00:00 2001
From: Paolo Bonzini 
Date: Thu, 29 Feb 2024 23:54:50 +0100
Subject: [PATCH 1/2] MdeModulePkg: introduce CoreRestoreTplInternal()
Content-Type: text/plain; charset=UTF-8

Introduce a function that restores the TPL just like gBS->RestoreTPL(),
but can optionally return with interrupts disabled.  This can be used
from interrupt handlers, so that any nested interrupt handler will only
see an elevated TPL and not the value on entry to the interrupt handler.

Signed-off-by: Paolo Bonzini 
---
 MdeModulePkg/Core/Dxe/Event/Tpl.c | 46 +--
 1 file changed, 31 insertions(+), 15 deletions(-)

diff --git a/MdeModulePkg/Core/Dxe/Event/Tpl.c b/MdeModulePkg/Core/Dxe/Event/Tpl.c
index b33f80573c..fe95ea3896 100644
--- a/MdeModulePkg/Core/Dxe/Event/Tpl.c
+++ b/MdeModulePkg/Core/Dxe/Event/Tpl.c
@@ -90,10 +90,11 @@ CoreRaiseTpl (
   @param  NewTpl  New, lower, task priority
 
 **/
-VOID
+static VOID
 EFIAPI
-CoreRestoreTpl (
-  IN EFI_TPL  NewTpl
+CoreRestoreTplInternal (
+  IN EFI_TPL  NewTpl,
+  IN BOOLEAN  DesiredInterruptState
   )
 {
   EFI_TPL  OldTpl;
@@ -107,11 +108,6 @@ CoreRestoreTpl (
 
   ASSERT (VALID_TPL (NewTpl));
 
-  //
-  // If lowering below HIGH_LEVEL, make sure
-  // interrupts are enabled
-  //
-
   if ((OldTpl >= TPL_HIGH_LEVEL) &&  (NewTpl < TPL_HIGH_LEVEL)) {
 gEfiCurrentTpl = TPL_HIGH_LEVEL;
   }
@@ -126,6 +122,11 @@ CoreRestoreTpl (
 }
 
 gEfiCurrentTpl = PendingTpl;
+
+//
+// If lowering below HIGH_LEVEL, make sure
+// interrupts are enabled
+//
 if (gEfiCurrentTpl < TPL_HIGH_LEVEL) {
   CoreSetInterruptState (TRUE);
 }
@@ -134,16 +135,31 @@ CoreRestoreTpl (
   }
 
   //
-  // Set the new value
+  // Set the new TPL with interrupt disabled.  If DesiredInterruptState
+  // is FALSE, this ensures that any nested interrupt handler will only
+  // see an elevated TPL and not NewTpl.
   //
-
+  CoreSetInterruptState (FALSE);
   gEfiCurrentTpl = NewTpl;
 
-  //
-  // If lowering below HIGH_LEVEL, make sure
-  // interrupts are enabled
-  //
-  if (gEfiCurrentTpl < TPL_HIGH_LEVEL) {
+  if (DesiredInterruptState) {
+ASSERT(gEfiCurrentTpl < TPL_HIGH_LEVEL);
 CoreSetInterruptState (TRUE);
   }
 }
+
+/**
+  Lowers the task priority to the previous value.   If the new
+  priority unmasks events at a higher priority, they are dispatched.
+
+  @param  NewTpl  New, lower, task priority
+
+**/
+VOID
+EFIAPI
+CoreRestoreTpl (
+  IN EFI_TPL NewTpl
+  )
+{
+  CoreRestoreTplInternal (NewTpl, NewTpl < 

Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-02-29 Thread Paolo Bonzini

On 2/29/24 20:22, Michael Brown wrote:

The design of NestedInterruptTplLib is that each nested interrupt must increase 
the TPL, but if I understand correctly there is a hole here:

   //
   // Call RestoreTPL() to allow event notifications to be
   // dispatched.  This will implicitly re-enable interrupts.
   //
   gBS->RestoreTPL (InterruptedTPL);

   //
   // Re-disable interrupts after the call to RestoreTPL() to ensure
   // that we have exclusive access to the shared state.
   //
   DisableInterrupts ();

because gBS->RestoreTPL will unconditionally enable interrupts if InterruptedTPL 
< TPL_HIGH_LEVEL.


There's no hole there.


Yes, what I meant is that the whole of NestedInterruptTplLib is designed 
around plugging that hole.  But if you can avoid re-enabling interrupts 
at the end of CoreRestoreTpl(), you don't need DisableInterruptsOnIret() 
anymore.


See the comments in the code for further details - I made them fairly 
extensive.  


Yup, I remember. :)

If possible, the easiest solution would be to merge 
NestedInterruptTplLib into Core DXE.


The question with that approach would be how to cleanly violate the 
abstraction layer that separates the timer interrupt handler (existing 
in a separate DXE driver executable) from the implementation of 
CoreRestoreTplInternal() (existing in core DXE and not exposed via the 
boot services table).


See outline in the other mail I have sent.

Paolo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116190): https://edk2.groups.io/g/devel/message/116190
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-02-29 Thread Paolo Bonzini

On 2/29/24 20:16, Kinney, Michael D wrote:




-Original Message-
From: Paolo Bonzini 
Sent: Thursday, February 29, 2024 11:04 AM
To: Ni, Ray ; devel@edk2.groups.io
Cc: Kinney, Michael D ; Liming Gao
; Laszlo Ersek ; Michael
Brown 
Subject: Re: [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue
due to nested interrupts

On 2/29/24 14:02, Ray Ni wrote:

In the end, it will lower the TPL to TPL_APPLICATION with interrupt

enabled.


However, it's possible that another timer interrupt happens just in

the end

of RestoreTPL() function when TPL is TPL_APPLICATION.


How do non-OVMF platforms solve the issue?  Do they just have the same
bug as in https://bugzilla.tianocore.org/show_bug.cgi?id=4162 ?


Yes.  This same issue can be reproduced on non-OVMF platforms.

This proposal here is an attempt to integrate a common fix into the DXE Core.

I would agree conceptually that integrating the NestedInterruptTplLib work
into the DXE Core is another option.

I believe the root cause of all of these scenarios is enabling interrupts
in RestoreTPL() when processing a timer interrupt between the last processed
event and the return from the interrupt handler. Ther are some instances
of the Timer Arch Protocol implementation that call Raise/Restore TPL, so
we want a DXE Core change that is compatible with the DXE Core doing 
Raise/Restore
when processing a timer interrupt and the Timer Arch Protocol implementation
also doing the Raise/Restore TPL.


Ok, now I understand better.

The reason why the NestedInterruptTplLib was introduced (as opposed to 
doing it in core DXE) was to enable returning with disabled interrupts 
from the nested interrupt handler, but I think it can be done with a 
function like the CoreRestoreTplInternal() I outlined in the previous 
email, which is the same as current CoreRestoreTpl() but finishes with


  if (!DesiredInterruptState) {
CoreSetInterruptState (FALSE);
  }
  gEfiCurrentTpl = NewTpl;
  if (DesiredInterruptState) {
ASSERT (gEfiCurrentTpl < TPL_HIGH_LEVEL);
CoreSetInterruptState (TRUE);
  }

The new CoreRaiseTpl would be the same as in Ray and your patch, while 
the CoreRestoreTpl would be something like this:


if (NewTpl == HighBitSet64 (mInterruptedTplMask)) {
  static NESTED_INTERRUPT_STATE NestedInterruptState;
  mInterruptedTplMask &= ~(UINTN)(1 << NewTpl);
  //
  // Use the deferred invocation logic that is currently
  // in NestedInterruptTplLib.
  //
  // But unlike current NestedInterruptRestoreTPL(), if the logic
  // is part of core DXE, the
  //
  //gBS->RestoreTPL (InterruptedTPL);
  //DisableInterrupts ();
  //
  // pair that requires "disable interrupts on IRET" logic can
  // be done without ever enabling interrupts,  with
  // CoreRestoreTplInternal(InterruptedTPL, FALSE)
  //
  // As an aside, NestedInterruptState might as well become a
  // pair of globals.
  //
  NestedInterruptRestoreTPL (NewTpl, );
} else {
  CoreRestoreTplInternal(NewTpl, NewTpl < TPL_HIGH_LEVEL);
}

Requiring matching raise/restore pairs is a bit scary.  It can be 
avoided by changing the "if" to a


  while (NewTpl >= HighBitSet64 (mInterruptedTplMask))
mInterruptedTplMask &=
  ~(UINTN)(1 << HighBitSet64 (mInterruptedTplMask));

Then, if inlining NestedInterruptRestoreTPL() allows simplifications, 
they can be done on top after the merge of NestedInterruptTplLib.  In 
particular, I suspect that the while loop above can be unified with the 
loop in NestedInterruptRestoreTPL().  But again, that would be best 
reviewed as a separate change.


All this, as Michael said, is however conditional on being able to deal 
with the TPL_HIGH_LEVEL+STI shenanigans that Windows does.


Paolo



The design of NestedInterruptTplLib is that each nested interrupt must
increase the TPL, but if I understand correctly there is a hole here:

//
// Call RestoreTPL() to allow event notifications to be
// dispatched.  This will implicitly re-enable interrupts.
//
gBS->RestoreTPL (InterruptedTPL);

//
// Re-disable interrupts after the call to RestoreTPL() to ensure
// that we have exclusive access to the shared state.
//
DisableInterrupts ();

because gBS->RestoreTPL will unconditionally enable interrupts if
InterruptedTPL < TPL_HIGH_LEVEL.


If possible, the easiest solution would be to merge
NestedInterruptTplLib into Core DXE.  This way, instead of calling
gBS->RestoreTPL, NestedInterruptTplLib can call a custom version of
CoreRestoreTpl that exits with interrupts disabled.  That is, something
like

VOID EFIAPI CoreRestoreTplInternal(IN EFI_TPL NewTpl,
 IN BOOLEAN InterruptState)
{
//
// The caller can request disabled interrupts to access shared
// state, but TPL_HIGH_LEVEL must *not* have them enabled.
//
ASSERT(!(NewTpl == TPL_HIGH_LEVEL && InterruptState));

// ...


Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-02-29 Thread Michael Brown

On 29/02/2024 19:26, Michael D Kinney wrote:

I think one advantage of this new proposal is to prevent
an extra level of nesting and use of stack resources in
that extra level.


I think that's a negligible benefit.  In the scenario as I outlined for 
NestedInterruptTplLib, there is potentially one more interrupt stack 
frame, but in that case the inner handler can only consume a small and 
fixed amount of stack space since it will not call RestoreTPL() (and so 
will not dispatch events, etc).


Thanks,

Michael



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116188): https://edk2.groups.io/g/devel/message/116188
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-02-29 Thread Michael Brown

On 29/02/2024 19:09, Michael D Kinney wrote:

"When the DXE Foundation is notified that the EFI_CPU_ARCH_PROTOCOL has
been installed, then the full version of the Boot Service RestoreTPL()
can be made available.  When an attempt is made to restore the TPL
level
to level below EFI_TPL_HIGH_LEVEL, then the DXE Foundation should use
the services of the EFI_CPU_ARCH_PROTOCOL to enable interrupts."


I would claim that this spec is perhaps incomplete in this area that
that incomplete description is what allows the window for interrupt
nesting to occur.  This language is correct for UEFI code that calls
Raise/Restore TPL once the CPU Arch Protocol is available.  It does
not cover the required behavior to prevent nesting when processing
a timer interrupt.  This could be considered a gap in the UEFI/PI
spec content.


I think it's important that we don't phrase it as preventing interrupt 
nesting.  The UEFI design *requires* that nested interrupts be allowed 
to happen, since callbacks at TPL_CALLBACK are allowed to wait for 
events at TPL_NOTIFY, and this can't happen without the existence of 
nested interrupts.


The problem is not nested interrupts per se: the problem is the 
potential for unlimited stack consumption.



- How does the proposed patch react to an interrupt occurring
(illegally) at TPL_HIGH_LEVEL (as happens with some versions of
Windows)?  As far as I can tell, it will result in mInterruptedTplMask
having bit 31 set but never cleared.  What impact will this have?


This behavior could potentially break any UEFI code that sets TPL to
TPL_HIGH_LEVEL as a lock, which can then cause any number of
undefined behaviors.  I am curious of you have a way to reproduce
this failure for testing purposed.

I would agree that any proposed change needs to comprehend this
Scenario if it can be reproduced with shipping OS images.


https://bugzilla.redhat.com/show_bug.cgi?id=2189136 was the original bug 
report in which it was discovered that Windows 11 would call 
RaiseTPL(TPL_HIGH_LEVEL) and then enable interrupts using the STI 
instruction.


It would be interesting to hear from anyone at Microsoft as to why this 
happens!



- How does the proposed patch react to potentially mismatched
RaisedTPL()/RestoreTPL() calls (e.g. oldTpl = RaiseTPL(TPL_CALLBACK)
followed by RaiseTPL(TPL_NOTIFY) followed by a single
RestoreTPL(oldTpl))?


The proposed patch only changes behavior when processing a timer
interrupt.  I do not think there would be any changes in behavior
for UEFI code that makes that sequence of calls.


The patch affects all callers of RaiseTPL() and RestoreTPL().  Given 
that it creates a new piece of shared state (mInterruptedTplMask), I'd 
like to see some kind of proof that it can correctly handle an arbitrary 
sequence of calls from unknown third-party code.


For example: consider an interrupt at TPL_APPLICATION with a third-party 
timer interrupt handler that does something like:


  OldTpl = RaiseTPL (TPL_HIGH_LEVEL);

  ... send EOI, call timer tick function, etc ...

  if (OldTpl < TPL_NOTIFY) {
RestoreTPL (TPL_NOTIFY);
... do some weird OEM-specific thing ...
  }

  RestoreTPL ( OldTpl );

This is arguably a valid sequence of calls to RaiseTPL()/RestoreTPL(). 
With the patch as-is, mInterruptedTplMask will have flagged the 
TPL_APPLICATION bit but not the TPL_NOTIFY bit, and so the call to 
RestoreTPL(TPL_NOTIFY) *will* re-enable interrupts, which is against the 
intention of the patch.


Thanks,

Michael



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116187): https://edk2.groups.io/g/devel/message/116187
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-02-29 Thread Michael D Kinney
I think one advantage of this new proposal is to prevent 
an extra level of nesting and use of stack resources in 
that extra level.

The nesting depth is then both predictable and minimized
for a given set of supported TPL levels.

Mike

> -Original Message-
> From: Michael Brown 
> Sent: Thursday, February 29, 2024 11:22 AM
> To: devel@edk2.groups.io; pbonz...@redhat.com; Ni, Ray
> 
> Cc: Kinney, Michael D ; Liming Gao
> ; Laszlo Ersek 
> Subject: Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack
> overflow issue due to nested interrupts
> 
> On 29/02/2024 19:04, Paolo Bonzini wrote:
> > On 2/29/24 14:02, Ray Ni wrote:
> >> In the end, it will lower the TPL to TPL_APPLICATION with interrupt
> >> enabled.
> >>
> >> However, it's possible that another timer interrupt happens just in
> >> the end
> >> of RestoreTPL() function when TPL is TPL_APPLICATION.
> >
> > How do non-OVMF platforms solve the issue?  Do they just have the
> same
> > bug as in https://bugzilla.tianocore.org/show_bug.cgi?id=4162 ?
> 
> Yes, they have that bug (or one of the bugs mentioned in that long
> discussion, depending on which particular implementation choices have
> been made).
> 
> > The design of NestedInterruptTplLib is that each nested interrupt
> must
> > increase the TPL, but if I understand correctly there is a hole here:
> >
> >    //
> >    // Call RestoreTPL() to allow event notifications to be
> >    // dispatched.  This will implicitly re-enable interrupts.
> >    //
> >    gBS->RestoreTPL (InterruptedTPL);
> >
> >    //
> >    // Re-disable interrupts after the call to RestoreTPL() to ensure
> >    // that we have exclusive access to the shared state.
> >    //
> >    DisableInterrupts ();
> >
> > because gBS->RestoreTPL will unconditionally enable interrupts if
> > InterruptedTPL < TPL_HIGH_LEVEL.
> 
> There's no hole there.
> 
> Yes, interrupts will be temporarily reenabled, but the whole function
> of
> NestedInterruptTplLib is to safely allow for this window in which
> interrupts have been (annoyingly) enabled by RestoreTPL().
> 
> If another interrupt *does* occur within that window, the inner
> interrupt handler will call NestedInterruptRestoreTPL(), which will
> take
> the code path leading to the "DEFERRAL INVOCATION POINT", and will
> therefore *not* call RestoreTPL() within that stack frame.  The inner
> interrupt returns at TPL_HIGH_LEVEL with interrupts still disabled, and
> execution is therefore guaranteed to immediately reach the "DEFERRAL
> RETURN POINT" in the outer interrupt handler.  The deferred call to
> RestoreTPL() is then safely executed in the context of the outer
> interrupt handler (i.e. with zero increase in stack usage, hence a
> guarantee of no stack overflow).
> 
> See the comments in the code for further details - I made them fairly
> extensive.  :)
> 
> > If possible, the easiest solution would be to merge
> > NestedInterruptTplLib into Core DXE.
> 
> The question with that approach would be how to cleanly violate the
> abstraction layer that separates the timer interrupt handler (existing
> in a separate DXE driver executable) from the implementation of
> CoreRestoreTplInternal() (existing in core DXE and not exposed via the
> boot services table).
> 
> Thanks,
> 
> Michael



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116186): https://edk2.groups.io/g/devel/message/116186
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-02-29 Thread Michael Brown

On 29/02/2024 19:04, Paolo Bonzini wrote:

On 2/29/24 14:02, Ray Ni wrote:
In the end, it will lower the TPL to TPL_APPLICATION with interrupt 
enabled.


However, it's possible that another timer interrupt happens just in 
the end

of RestoreTPL() function when TPL is TPL_APPLICATION.


How do non-OVMF platforms solve the issue?  Do they just have the same 
bug as in https://bugzilla.tianocore.org/show_bug.cgi?id=4162 ?


Yes, they have that bug (or one of the bugs mentioned in that long 
discussion, depending on which particular implementation choices have 
been made).


The design of NestedInterruptTplLib is that each nested interrupt must 
increase the TPL, but if I understand correctly there is a hole here:


   //
   // Call RestoreTPL() to allow event notifications to be
   // dispatched.  This will implicitly re-enable interrupts.
   //
   gBS->RestoreTPL (InterruptedTPL);

   //
   // Re-disable interrupts after the call to RestoreTPL() to ensure
   // that we have exclusive access to the shared state.
   //
   DisableInterrupts ();

because gBS->RestoreTPL will unconditionally enable interrupts if 
InterruptedTPL < TPL_HIGH_LEVEL.


There's no hole there.

Yes, interrupts will be temporarily reenabled, but the whole function of 
NestedInterruptTplLib is to safely allow for this window in which 
interrupts have been (annoyingly) enabled by RestoreTPL().


If another interrupt *does* occur within that window, the inner 
interrupt handler will call NestedInterruptRestoreTPL(), which will take 
the code path leading to the "DEFERRAL INVOCATION POINT", and will 
therefore *not* call RestoreTPL() within that stack frame.  The inner 
interrupt returns at TPL_HIGH_LEVEL with interrupts still disabled, and 
execution is therefore guaranteed to immediately reach the "DEFERRAL 
RETURN POINT" in the outer interrupt handler.  The deferred call to 
RestoreTPL() is then safely executed in the context of the outer 
interrupt handler (i.e. with zero increase in stack usage, hence a 
guarantee of no stack overflow).


See the comments in the code for further details - I made them fairly 
extensive.  :)


If possible, the easiest solution would be to merge 
NestedInterruptTplLib into Core DXE.


The question with that approach would be how to cleanly violate the 
abstraction layer that separates the timer interrupt handler (existing 
in a separate DXE driver executable) from the implementation of 
CoreRestoreTplInternal() (existing in core DXE and not exposed via the 
boot services table).


Thanks,

Michael



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116185): https://edk2.groups.io/g/devel/message/116185
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-02-29 Thread Michael D Kinney


> -Original Message-
> From: Paolo Bonzini 
> Sent: Thursday, February 29, 2024 11:04 AM
> To: Ni, Ray ; devel@edk2.groups.io
> Cc: Kinney, Michael D ; Liming Gao
> ; Laszlo Ersek ; Michael
> Brown 
> Subject: Re: [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue
> due to nested interrupts
> 
> On 2/29/24 14:02, Ray Ni wrote:
> > In the end, it will lower the TPL to TPL_APPLICATION with interrupt
> enabled.
> >
> > However, it's possible that another timer interrupt happens just in
> the end
> > of RestoreTPL() function when TPL is TPL_APPLICATION.
> 
> How do non-OVMF platforms solve the issue?  Do they just have the same
> bug as in https://bugzilla.tianocore.org/show_bug.cgi?id=4162 ?

Yes.  This same issue can be reproduced on non-OVMF platforms.

This proposal here is an attempt to integrate a common fix into the DXE Core.

I would agree conceptually that integrating the NestedInterruptTplLib work
into the DXE Core is another option.

I believe the root cause of all of these scenarios is enabling interrupts
in RestoreTPL() when processing a timer interrupt between the last processed
event and the return from the interrupt handler. Ther are some instances
of the Timer Arch Protocol implementation that call Raise/Restore TPL, so
we want a DXE Core change that is compatible with the DXE Core doing 
Raise/Restore
when processing a timer interrupt and the Timer Arch Protocol implementation
also doing the Raise/Restore TPL.

> 
> The design of NestedInterruptTplLib is that each nested interrupt must
> increase the TPL, but if I understand correctly there is a hole here:
> 
>//
>// Call RestoreTPL() to allow event notifications to be
>// dispatched.  This will implicitly re-enable interrupts.
>//
>gBS->RestoreTPL (InterruptedTPL);
> 
>//
>// Re-disable interrupts after the call to RestoreTPL() to ensure
>// that we have exclusive access to the shared state.
>//
>DisableInterrupts ();
> 
> because gBS->RestoreTPL will unconditionally enable interrupts if
> InterruptedTPL < TPL_HIGH_LEVEL.
> 
> 
> If possible, the easiest solution would be to merge
> NestedInterruptTplLib into Core DXE.  This way, instead of calling
> gBS->RestoreTPL, NestedInterruptTplLib can call a custom version of
> CoreRestoreTpl that exits with interrupts disabled.  That is, something
> like
> 
> VOID EFIAPI CoreRestoreTplInternal(IN EFI_TPL NewTpl,
> IN BOOLEAN InterruptState)
> {
>//
>// The caller can request disabled interrupts to access shared
>// state, but TPL_HIGH_LEVEL must *not* have them enabled.
>//
>ASSERT(!(NewTpl == TPL_HIGH_LEVEL && InterruptState));
> 
>// ...
> 
>gEfiCurrentTpl = NewTpl;
>CoreSetInterruptState (InterruptState);
> }
> 
> Now, CoreRestoreTpl is just
> 
>//
>// If lowering below HIGH_LEVEL, make sure
>// interrupts are enabled
>//
>CoreRestoreTplInternal(NewTpl, NewTpl < TPL_HIGH_LEVEL);
> 
> whereas NestedInterruptRestoreTPL can do
> 
>//
>// Call RestoreTPL() to allow event notifications to be
>// dispatched.  This will implicitly re-enable interrupts,
>// but only if events have to be dispatched.
>//
>CoreRestoreTplInternal(InterruptedTPL, FALSE);
> 
>//
>// Interrupts are now disabled, so we can access shared state.
>//
> 
> This avoids the unlimited nesting of interrupts because each stack
> frame
> will indeed have a higher TPL than the outer version.
> 
> Paolo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116184): https://edk2.groups.io/g/devel/message/116184
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-02-29 Thread Michael D Kinney
> -Original Message-
> From: Michael Brown 
> Sent: Thursday, February 29, 2024 9:39 AM
> To: Kinney, Michael D ;
> devel@edk2.groups.io; Ni, Ray 
> Cc: Liming Gao ; Laszlo Ersek
> ; Paolo Bonzini 
> Subject: Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack
> overflow issue due to nested interrupts
> 
> On 29/02/2024 16:43, Kinney, Michael D wrote:
> > Hi Michael,
> >
> > Can you provide a pointer to the UEFI Spec statement this breaks?
> 
> II-9.7.1.3 RestoreTPL():
> 
> "When the DXE Foundation is notified that the EFI_CPU_ARCH_PROTOCOL has
> been installed, then the full version of the Boot Service RestoreTPL()
> can be made available.  When an attempt is made to restore the TPL
> level
> to level below EFI_TPL_HIGH_LEVEL, then the DXE Foundation should use
> the services of the EFI_CPU_ARCH_PROTOCOL to enable interrupts."

I would claim that this spec is perhaps incomplete in this area that
that incomplete description is what allows the window for interrupt
nesting to occur.  This language is correct for UEFI code that calls
Raise/Restore TPL once the CPU Arch Protocol is available.  It does
not cover the required behavior to prevent nesting when processing
a timer interrupt.  This could be considered a gap in the UEFI/PI
spec content.

> 
> I suspect this is sufficient to veto the proposed design, though we
> could argue that the loosely worded "should" is technically not "must".
> 
> 
> If we still want to proceed with this design, then I have several other
> questions:
> 
> - How does the proposed patch react to an interrupt occurring
> (illegally) at TPL_HIGH_LEVEL (as happens with some versions of
> Windows)?  As far as I can tell, it will result in mInterruptedTplMask
> having bit 31 set but never cleared.  What impact will this have?

This behavior could potentially break any UEFI code that sets TPL to
TPL_HIGH_LEVEL as a lock, which can then cause any number of 
undefined behaviors.  I am curious of you have a way to reproduce 
this failure for testing purposed.

I would agree that any proposed change needs to comprehend this
Scenario if it can be reproduced with shipping OS images.

> 
> - How does the proposed patch react to potentially mismatched
> RaisedTPL()/RestoreTPL() calls (e.g. oldTpl = RaiseTPL(TPL_CALLBACK)
> followed by RaiseTPL(TPL_NOTIFY) followed by a single
> RestoreTPL(oldTpl))?

The proposed patch only changes behavior when processing a timer
interrupt.  I do not think there would be any changes in behavior
for UEFI code that makes that sequence of calls.  

> 
> 
> I believe the proposed patch is attempting to establish a new invariant
> as follows:
> 
> Once an interrupt has occured at a given TPL, then that *TPL* is
> conceptually considered to be in an "interrupted" state.  The *only*
> thing that can clear this "interrupted" state from the TPL is to return
> from the interrupt handler.
> 
> Note that this conceptual definition does not perfectly align with the
> bit flags in mInterruptedTplMask, since those bits will necessarily be
> set only some time after the interrupt occurs, and will have to be
> cleared before returning from the interrupt.  However, it is the
> conceptual definition that is relevant to the invariant.
> 
> The new invariant is that no code may execute at an "interrupted" TPL
> with interrupts enabled.  It is legitimate for code to raise to a
> higher
> TPL and to enable interrupts while there, and it is legitimate for code
> to execute in an "interrupted" TPL with interrupts disabled, but it is
> not legitimate for any code to reenable interrupts while still at an
> "interrupted" TPL.
> 
> It would be good to call out this invariant explicitly, so that authors
> of interrupt handlers are aware of the restrictions.  It would also
> clarify some of the logic (e.g. it provides the reason why interrupts
> must be disabled before lowering gEfiCurrentTpl in CoreRestoreTpl()).
> 
> It's also generally easier to reason about a stated invariant than to
> extrapolate from a list of complicated examples.

I agree that the proposed code change should describe the change in 
this way, and that the examples currently included in comments would
be better in a BZ.

> 
> Thanks,
> 
> Michael



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116183): https://edk2.groups.io/g/devel/message/116183
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-02-29 Thread Michael D Kinney
Hi Paolo,

The proposed change does not disable interrupts at TPL below TPL_HIGH_LEVEL 
when processing event handlers.

It only prevents interrupts being enabled in the window from the last event 
processed in a timer interrupt and the return from the timer interrupt handler.

This is a window where the only control flows are in the DXE Core and the exit 
of the timer interrupt handler.

Mike

From: devel@edk2.groups.io  On Behalf Of Paolo Bonzini
Sent: Thursday, February 29, 2024 9:40 AM
To: Kinney, Michael D 
Cc: Michael Brown ; edk2-devel-groups-io 
; Ni, Ray ; Liming Gao 
; Laszlo Ersek 
Subject: Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow 
issue due to nested interrupts


Il gio 29 feb 2024, 17:45 Kinney, Michael D 
mailto:michael.d.kin...@intel.com>> ha scritto:
Hi Michael,

Can you provide a pointer to the UEFI Spec statement this breaks?

The spec does say that interrupts are disabled for TPL_HIGH_LEVEL, but indeed 
it doesn't say they are always enabled at lower levels. However, if the 
interrupts aren't always enabled whenever you're below TPL_HIGH_LEVEL, you get 
priority inversions (and deadlocks).

For example, if you end up running with interrupts disabled at TPL_CALLBACK, 
you are disabling the dispatching of timers at TPL_NOTIFY.

I guess this can be deduced from these two passages:

- "The functions in these queues are invoked in FIFO order, starting with the 
highest priority level queue and proceeding to the lowest priority queue that 
is unmasked by the current TPL"

- "If Type is TimerRelative and TriggerTime is 0, then the timer event will be 
signaled on the next timer tick" (in the description of gBS->SetTimer)

Paolo


Thanks,

Mike

> -Original Message-
> From: Michael Brown mailto:mc...@ipxe.org>>
> Sent: Thursday, February 29, 2024 5:23 AM
> To: devel@edk2.groups.io<mailto:devel@edk2.groups.io>; Ni, Ray 
> mailto:ray...@intel.com>>
> Cc: Kinney, Michael D 
> mailto:michael.d.kin...@intel.com>>; Liming Gao
> mailto:gaolim...@byosoft.com.cn>>; Laszlo Ersek 
> mailto:ler...@redhat.com>>; Paolo
> Bonzini mailto:pbonz...@redhat.com>>
> Subject: Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack
> overflow issue due to nested interrupts
>
> On 29/02/2024 13:02, Ni, Ray wrote:
> > A ideal solution is to not keep the interrupt disabled when
> > RestoreTPL(TPL_HIGH -> not TPL_HIGH) is executed in the timer
> interrupt
> > context because the interrupt handler will re-enable the interrupt
> with
> > arch specific instructions (e.g.: IRET for x86).
> >
> > The patch introduces mInterruptedTplMask which tells RestoreTPL() if
> > it's called in the interrupt context and whether it should defer
> enabling
> > the interrupt.
>
> NACK.  This breaks the specification-defined behaviour for
> RestoreTPL().
>
> What guarantees do we have that there is no code anywhere in the world
> that relies upon RestoreTPL() unconditionally re-enabling interrupts.
>
> I also find this code substantially harder to follow than
> NestedInterruptTplLib (which does not break any specified behaviour).
>
> Thanks,
>
> Michael



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116182): https://edk2.groups.io/g/devel/message/116182
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-02-29 Thread Paolo Bonzini

On 2/29/24 14:02, Ray Ni wrote:

In the end, it will lower the TPL to TPL_APPLICATION with interrupt enabled.

However, it's possible that another timer interrupt happens just in the end
of RestoreTPL() function when TPL is TPL_APPLICATION.


How do non-OVMF platforms solve the issue?  Do they just have the same 
bug as in https://bugzilla.tianocore.org/show_bug.cgi?id=4162 ?


The design of NestedInterruptTplLib is that each nested interrupt must 
increase the TPL, but if I understand correctly there is a hole here:


  //
  // Call RestoreTPL() to allow event notifications to be
  // dispatched.  This will implicitly re-enable interrupts.
  //
  gBS->RestoreTPL (InterruptedTPL);

  //
  // Re-disable interrupts after the call to RestoreTPL() to ensure
  // that we have exclusive access to the shared state.
  //
  DisableInterrupts ();

because gBS->RestoreTPL will unconditionally enable interrupts if 
InterruptedTPL < TPL_HIGH_LEVEL.



If possible, the easiest solution would be to merge 
NestedInterruptTplLib into Core DXE.  This way, instead of calling 
gBS->RestoreTPL, NestedInterruptTplLib can call a custom version of 
CoreRestoreTpl that exits with interrupts disabled.  That is, something like


VOID EFIAPI CoreRestoreTplInternal(IN EFI_TPL NewTpl,
   IN BOOLEAN InterruptState)
{
  //
  // The caller can request disabled interrupts to access shared
  // state, but TPL_HIGH_LEVEL must *not* have them enabled.
  //
  ASSERT(!(NewTpl == TPL_HIGH_LEVEL && InterruptState));

  // ...

  gEfiCurrentTpl = NewTpl;
  CoreSetInterruptState (InterruptState);
}

Now, CoreRestoreTpl is just

  //
  // If lowering below HIGH_LEVEL, make sure
  // interrupts are enabled
  //
  CoreRestoreTplInternal(NewTpl, NewTpl < TPL_HIGH_LEVEL);

whereas NestedInterruptRestoreTPL can do

  //
  // Call RestoreTPL() to allow event notifications to be
  // dispatched.  This will implicitly re-enable interrupts,
  // but only if events have to be dispatched.
  //
  CoreRestoreTplInternal(InterruptedTPL, FALSE);

  //
  // Interrupts are now disabled, so we can access shared state.
  //

This avoids the unlimited nesting of interrupts because each stack frame 
will indeed have a higher TPL than the outer version.


Paolo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116181): https://edk2.groups.io/g/devel/message/116181
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-02-29 Thread Paolo Bonzini
Il gio 29 feb 2024, 17:45 Kinney, Michael D  ha
scritto:

> Hi Michael,
>
> Can you provide a pointer to the UEFI Spec statement this breaks?
>

The spec does say that interrupts are disabled for TPL_HIGH_LEVEL, but
indeed it doesn't say they are always enabled at lower levels. However, if
the interrupts aren't always enabled whenever you're below TPL_HIGH_LEVEL,
you get priority inversions (and deadlocks).

For example, if you end up running with interrupts disabled at
TPL_CALLBACK, you are disabling the dispatching of timers at TPL_NOTIFY.

I guess this can be deduced from these two passages:

- "The functions in these queues are invoked in FIFO order, starting with
the highest priority level queue and proceeding to the lowest priority
queue that is unmasked by the current TPL"

- "If Type is TimerRelative and TriggerTime is 0, then the timer event will
be signaled on the next timer tick" (in the description of gBS->SetTimer)

Paolo


> Thanks,
>
> Mike
>
> > -Original Message-
> > From: Michael Brown 
> > Sent: Thursday, February 29, 2024 5:23 AM
> > To: devel@edk2.groups.io; Ni, Ray 
> > Cc: Kinney, Michael D ; Liming Gao
> > ; Laszlo Ersek ; Paolo
> > Bonzini 
> > Subject: Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack
> > overflow issue due to nested interrupts
> >
> > On 29/02/2024 13:02, Ni, Ray wrote:
> > > A ideal solution is to not keep the interrupt disabled when
> > > RestoreTPL(TPL_HIGH -> not TPL_HIGH) is executed in the timer
> > interrupt
> > > context because the interrupt handler will re-enable the interrupt
> > with
> > > arch specific instructions (e.g.: IRET for x86).
> > >
> > > The patch introduces mInterruptedTplMask which tells RestoreTPL() if
> > > it's called in the interrupt context and whether it should defer
> > enabling
> > > the interrupt.
> >
> > NACK.  This breaks the specification-defined behaviour for
> > RestoreTPL().
> >
> > What guarantees do we have that there is no code anywhere in the world
> > that relies upon RestoreTPL() unconditionally re-enabling interrupts.
> >
> > I also find this code substantially harder to follow than
> > NestedInterruptTplLib (which does not break any specified behaviour).
> >
> > Thanks,
> >
> > Michael
>
>


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116179): https://edk2.groups.io/g/devel/message/116179
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-02-29 Thread Michael Brown

On 29/02/2024 16:43, Kinney, Michael D wrote:

Hi Michael,

Can you provide a pointer to the UEFI Spec statement this breaks?


II-9.7.1.3 RestoreTPL():

"When the DXE Foundation is notified that the EFI_CPU_ARCH_PROTOCOL has 
been installed, then the full version of the Boot Service RestoreTPL() 
can be made available.  When an attempt is made to restore the TPL level 
to level below EFI_TPL_HIGH_LEVEL, then the DXE Foundation should use 
the services of the EFI_CPU_ARCH_PROTOCOL to enable interrupts."


I suspect this is sufficient to veto the proposed design, though we 
could argue that the loosely worded "should" is technically not "must".



If we still want to proceed with this design, then I have several other 
questions:


- How does the proposed patch react to an interrupt occurring 
(illegally) at TPL_HIGH_LEVEL (as happens with some versions of 
Windows)?  As far as I can tell, it will result in mInterruptedTplMask 
having bit 31 set but never cleared.  What impact will this have?


- How does the proposed patch react to potentially mismatched 
RaisedTPL()/RestoreTPL() calls (e.g. oldTpl = RaiseTPL(TPL_CALLBACK) 
followed by RaiseTPL(TPL_NOTIFY) followed by a single RestoreTPL(oldTpl))?



I believe the proposed patch is attempting to establish a new invariant 
as follows:


Once an interrupt has occured at a given TPL, then that *TPL* is 
conceptually considered to be in an "interrupted" state.  The *only* 
thing that can clear this "interrupted" state from the TPL is to return 
from the interrupt handler.


Note that this conceptual definition does not perfectly align with the 
bit flags in mInterruptedTplMask, since those bits will necessarily be 
set only some time after the interrupt occurs, and will have to be 
cleared before returning from the interrupt.  However, it is the 
conceptual definition that is relevant to the invariant.


The new invariant is that no code may execute at an "interrupted" TPL 
with interrupts enabled.  It is legitimate for code to raise to a higher 
TPL and to enable interrupts while there, and it is legitimate for code 
to execute in an "interrupted" TPL with interrupts disabled, but it is 
not legitimate for any code to reenable interrupts while still at an 
"interrupted" TPL.


It would be good to call out this invariant explicitly, so that authors 
of interrupt handlers are aware of the restrictions.  It would also 
clarify some of the logic (e.g. it provides the reason why interrupts 
must be disabled before lowering gEfiCurrentTpl in CoreRestoreTpl()).


It's also generally easier to reason about a stated invariant than to 
extrapolate from a list of complicated examples.


Thanks,

Michael



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116178): https://edk2.groups.io/g/devel/message/116178
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-02-29 Thread Michael D Kinney
Hi Michael,

Can you provide a pointer to the UEFI Spec statement this breaks?

Thanks,

Mike 

> -Original Message-
> From: Michael Brown 
> Sent: Thursday, February 29, 2024 5:23 AM
> To: devel@edk2.groups.io; Ni, Ray 
> Cc: Kinney, Michael D ; Liming Gao
> ; Laszlo Ersek ; Paolo
> Bonzini 
> Subject: Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack
> overflow issue due to nested interrupts
> 
> On 29/02/2024 13:02, Ni, Ray wrote:
> > A ideal solution is to not keep the interrupt disabled when
> > RestoreTPL(TPL_HIGH -> not TPL_HIGH) is executed in the timer
> interrupt
> > context because the interrupt handler will re-enable the interrupt
> with
> > arch specific instructions (e.g.: IRET for x86).
> >
> > The patch introduces mInterruptedTplMask which tells RestoreTPL() if
> > it's called in the interrupt context and whether it should defer
> enabling
> > the interrupt.
> 
> NACK.  This breaks the specification-defined behaviour for
> RestoreTPL().
> 
> What guarantees do we have that there is no code anywhere in the world
> that relies upon RestoreTPL() unconditionally re-enabling interrupts.
> 
> I also find this code substantially harder to follow than
> NestedInterruptTplLib (which does not break any specified behaviour).
> 
> Thanks,
> 
> Michael



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116177): https://edk2.groups.io/g/devel/message/116177
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts

2024-02-29 Thread Michael Brown

On 29/02/2024 13:02, Ni, Ray wrote:

A ideal solution is to not keep the interrupt disabled when
RestoreTPL(TPL_HIGH -> not TPL_HIGH) is executed in the timer interrupt
context because the interrupt handler will re-enable the interrupt with
arch specific instructions (e.g.: IRET for x86).

The patch introduces mInterruptedTplMask which tells RestoreTPL() if
it's called in the interrupt context and whether it should defer enabling
the interrupt.


NACK.  This breaks the specification-defined behaviour for RestoreTPL().

What guarantees do we have that there is no code anywhere in the world 
that relies upon RestoreTPL() unconditionally re-enabling interrupts.


I also find this code substantially harder to follow than 
NestedInterruptTplLib (which does not break any specified behaviour).


Thanks,

Michael



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116168): https://edk2.groups.io/g/devel/message/116168
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-