Re: SERIOUS BUG: Zero Latency Interrupts are broken!

2024-12-10 Thread Nathan Hartman
On Mon, Dec 9, 2024 at 9:46 PM Xiang Xiao  wrote:
>
> On Tue, Dec 10, 2024 at 5:53 AM Nathan Hartman 
> wrote:
>
> > Hi all,
> >
> > Unfortunately I missed the PR before it was merged, but PR-15073 has
> > broken High Priority, Zero Latency Interrupts! Fortunately I caught it
> > now. It was merged 17 hours ago.
> >
> >
> Zero Latency Interrupts doesn't remove from PR-15073.
>
>
> > This PR removed a very important Kconfig: config ARMV7M_USEBASEPRI,
> > and much of the logic related to it. This config is *required* when
> > using CONFIG_ARCH_HIPRI_INTERRUPT.
> >
> >
> PR-15073 doesn't remove ARMV7M_USEBASEPRI, but makes it enabled
> unconditionally, that's why this patch removes this config.


Correct. I am writing to confirm that the PRs are correct and did not
break the Zero Latency Interrupts. The PRs remove support for PRIMASK,
not BASEPRI, on architectures that support BASEPRI. There is no need
to support PRIMASK on these architectures because BASEPRI is a
superset of the functionality. PRIMASK is not needed for Zero Latency
Interrupts, so there is no need to maintain two different
implementations.

Thanks for your patience!

Nathan


Re: SERIOUS BUG: Zero Latency Interrupts are broken!

2024-12-10 Thread Nathan Hartman
On Tue, Dec 10, 2024 at 9:06 AM Gregory Nutt  wrote:

> On 12/10/2024 7:59 AM, Nathan Hartman wrote:
> > On Tue, Dec 10, 2024 at 8:11 AM spudaneco  wrote:
> >
> >> I wonder if we should also disable the prioritization APIs.  In the
> >> normal, default case, any reprioritization of an IRQ introduces a fatal,
> >> mysterious bug.  That is because nested interrupts will occur and may
> not
> >> be supported.Sent from my Galaxy
> >
> > Aren't the prioritization APIs needed for setting up the Zero Latency
> > Interrupts? They exist "outside" the OS so they don't touch any OS
> > structures, so no mysterious bug should occur.
>
> Yes, I tried to distinguish that case with some weasel words (normal,
> default).  Even if high priority interrupts are used (maybe enabled),
> the same fatal error would occur if any other interrupts that share the
> same handler were reprioritized



True. With shared interrupt handlers, ALL the interrupts that share the
same handler must have the same priority or else they might corrupt the
shared data structures.


Re: SERIOUS BUG: Zero Latency Interrupts are broken!

2024-12-10 Thread Gregory Nutt

On 12/10/2024 7:59 AM, Nathan Hartman wrote:

On Tue, Dec 10, 2024 at 8:11 AM spudaneco  wrote:


I wonder if we should also disable the prioritization APIs.  In the
normal, default case, any reprioritization of an IRQ introduces a fatal,
mysterious bug.  That is because nested interrupts will occur and may not
be supported.Sent from my Galaxy


Aren't the prioritization APIs needed for setting up the Zero Latency
Interrupts? They exist "outside" the OS so they don't touch any OS
structures, so no mysterious bug should occur.


Yes, I tried to distinguish that case with some weasel words (normal, 
default).  Even if high priority interrupts are used (maybe enabled), 
the same fatal error would occur if any other interrupts that share the 
same handler were reprioritized




Re: SERIOUS BUG: Zero Latency Interrupts are broken!

2024-12-10 Thread Nathan Hartman
On Tue, Dec 10, 2024 at 8:11 AM spudaneco  wrote:

> I wonder if we should also disable the prioritization APIs.  In the
> normal, default case, any reprioritization of an IRQ introduces a fatal,
> mysterious bug.  That is because nested interrupts will occur and may not
> be supported.Sent from my Galaxy



Aren't the prioritization APIs needed for setting up the Zero Latency
Interrupts? They exist "outside" the OS so they don't touch any OS
structures, so no mysterious bug should occur.

Maybe the prioritization APIs should be enabled with
CONFIG_ARCH_HIPRI_INTERRUPT?

And the docstrings can explain not to use them for interrupts handled by
the OS.

Cheers,
Nathan


Re: SERIOUS BUG: Zero Latency Interrupts are broken!

2024-12-10 Thread spudaneco
I wonder if we should also disable the prioritization APIs.  In the normal, 
default case, any reprioritization of an IRQ introduces a fatal, mysterious 
bug.  That is because nested interrupts will occur and may not be 
supported.Sent from my Galaxy
 Original message From: Nathan Hartman 
 Date: 12/9/24  10:11 PM  (GMT-06:00) To: 
dev@nuttx.apache.org Subject: Re: SERIOUS BUG: Zero Latency Interrupts are 
broken! On Mon, Dec 9, 2024 at 9:46 PM Xiang Xiao  
wrote:>> On Tue, Dec 10, 2024 at 5:53 AM Nathan Hartman 
> wrote:>> > Hi all,> >> > Unfortunately I missed the 
PR before it was merged, but PR-15073 has> > broken High Priority, Zero Latency 
Interrupts! Fortunately I caught it> > now. It was merged 17 hours ago.> >> >> 
Zero Latency Interrupts doesn't remove from PR-15073.>>> > This PR removed a 
very important Kconfig: config ARMV7M_USEBASEPRI,> > and much of the logic 
related to it. This config is *required* when> > using 
CONFIG_ARCH_HIPRI_INTERRUPT.> >> >> PR-15073 doesn't remove ARMV7M_USEBASEPRI, 
but makes it enabled> unconditionally, that's why this patch removes this 
config.Thank you.I studied the changes of #15073 and #15086 and they seem 
correct, toenable BASEPRI always in armv7m and armv8m. So, the Zero 
LatencyInterrupts should work. I'll test it in the morning to be sure.tl;dr: 
Please don't revert those PRs.I am still studying the #14881 changes. This one 
is more complexbecause it changes how the recordkeeping is done.Unfortunately 
it is very late now so I need to continue in the morning.> 
https://github.com/apache/nuttx/pull/15102 removes ARMV7M_USEBASEPRI from> 
documentation to match the new code base.Thanks for the PR. I will review 
soon.Cheers,Nathan

Re: SERIOUS BUG: Zero Latency Interrupts are broken!

2024-12-09 Thread Nathan Hartman
On Mon, Dec 9, 2024 at 9:46 PM Xiang Xiao  wrote:
>
> On Tue, Dec 10, 2024 at 5:53 AM Nathan Hartman 
> wrote:
>
> > Hi all,
> >
> > Unfortunately I missed the PR before it was merged, but PR-15073 has
> > broken High Priority, Zero Latency Interrupts! Fortunately I caught it
> > now. It was merged 17 hours ago.
> >
> >
> Zero Latency Interrupts doesn't remove from PR-15073.
>
>
> > This PR removed a very important Kconfig: config ARMV7M_USEBASEPRI,
> > and much of the logic related to it. This config is *required* when
> > using CONFIG_ARCH_HIPRI_INTERRUPT.
> >
> >
> PR-15073 doesn't remove ARMV7M_USEBASEPRI, but makes it enabled
> unconditionally, that's why this patch removes this config.

Thank you.

I studied the changes of #15073 and #15086 and they seem correct, to
enable BASEPRI always in armv7m and armv8m. So, the Zero Latency
Interrupts should work. I'll test it in the morning to be sure.

tl;dr: Please don't revert those PRs.

I am still studying the #14881 changes. This one is more complex
because it changes how the recordkeeping is done.

Unfortunately it is very late now so I need to continue in the morning.

> https://github.com/apache/nuttx/pull/15102 removes ARMV7M_USEBASEPRI from
> documentation to match the new code base.

Thanks for the PR. I will review soon.

Cheers,
Nathan


Re: SERIOUS BUG: Zero Latency Interrupts are broken!

2024-12-09 Thread Xiang Xiao
On Tue, Dec 10, 2024 at 5:53 AM Nathan Hartman 
wrote:

> Hi all,
>
> Unfortunately I missed the PR before it was merged, but PR-15073 has
> broken High Priority, Zero Latency Interrupts! Fortunately I caught it
> now. It was merged 17 hours ago.
>
>
Zero Latency Interrupts doesn't remove from PR-15073.


> This PR removed a very important Kconfig: config ARMV7M_USEBASEPRI,
> and much of the logic related to it. This config is *required* when
> using CONFIG_ARCH_HIPRI_INTERRUPT.
>
>
PR-15073 doesn't remove ARMV7M_USEBASEPRI, but makes it enabled
unconditionally, that's why this patch removes this config.


> High Priority, Zero Latency Interrupts are documented here:
> https://nuttx.apache.org/docs/latest/guides/zerolatencyinterrupts.html
>
>
https://github.com/apache/nuttx/pull/15102 removes ARMV7M_USEBASEPRI from
documentation to match the new code base.


> I depend strongly on Zero Latency Interrupts. It is *required* for
> critical handling in my firmware. I cannot use NuttX at all if NuttX
> does not have Zero Latency Interrupts anymore.
>
> PR-15073 was meant to fix a regression in PR-14881. PR-14881 attempted
> to optimize handling of g_current_regs. The commit log of PR-15073
> explains this. [1]
>
> We have to find another way to fix it, without destroying the Zero
> Latency Interrupts feature. Otherwise, it is a very serious problem.
>
> I suggest to revert PR-15073, and possibly revert PR-14881, and find
> another way to implement PR-14881 that doesn't cause this problem.
>
> References:
>
> [1] PR-15073:
> https://github.com/apache/nuttx/commit/0e1b432dd068880d353ae91c01e610f85b4a16ea#diff-24754055e0d9f2d702b8503e7f5b520394cb59affb5d24457fe0641c1d7678ceL27
>
> [2] PR-14881: https://github.com/apache/nuttx/pull/14881
>
> [3] High Priority Zero Latency Interrupts:
> https://nuttx.apache.org/docs/latest/guides/zerolatencyinterrupts.html
>
> Thanks,
> Nathan
>


Re: SERIOUS BUG: Zero Latency Interrupts are broken!

2024-12-09 Thread Gregory Nutt



On 12/9/2024 3:52 PM, Nathan Hartman wrote:

Hi all,

Unfortunately I missed the PR before it was merged, but PR-15073 has
broken High Priority, Zero Latency Interrupts! Fortunately I caught it
now. It was merged 17 hours ago.

This PR removed a very important Kconfig: config ARMV7M_USEBASEPRI,
and much of the logic related to it. This config is *required* when
using CONFIG_ARCH_HIPRI_INTERRUPT.

High Priority, Zero Latency Interrupts are documented here:
https://nuttx.apache.org/docs/latest/guides/zerolatencyinterrupts.html

I depend strongly on Zero Latency Interrupts. It is *required* for
critical handling in my firmware. I cannot use NuttX at all if NuttX
does not have Zero Latency Interrupts anymore.

PR-15073 was meant to fix a regression in PR-14881. PR-14881 attempted
to optimize handling of g_current_regs. The commit log of PR-15073
explains this. [1]

We have to find another way to fix it, without destroying the Zero
Latency Interrupts feature. Otherwise, it is a very serious problem.

I suggest to revert PR-15073, and possibly revert PR-14881, and find
another way to implement PR-14881 that doesn't cause this problem.

References:

[1] PR-15073: 
https://github.com/apache/nuttx/commit/0e1b432dd068880d353ae91c01e610f85b4a16ea#diff-24754055e0d9f2d702b8503e7f5b520394cb59affb5d24457fe0641c1d7678ceL27

[2] PR-14881: https://github.com/apache/nuttx/pull/14881

[3] High Priority Zero Latency Interrupts:
https://nuttx.apache.org/docs/latest/guides/zerolatencyinterrupts.html

Thanks,
Nathan


Using the basepri register would also be important if you ever want to 
implement nested interrupts.  I purist would say that nested, priority 
interrupts is necessary for true real-time performance.  More info:  
https://cwiki.apache.org/confluence/display/NUTTX/Nested+Interrupts


Samsung has a working implementation of this in TizenRT: The are about 3 
or 4 commits in May of 2019 that provides the full implementation.  It 
should drop into those architectures pretty simply.





Re: SERIOUS BUG: Zero Latency Interrupts are broken!

2024-12-09 Thread Tomek CEDRO
Whoops, good catch Nathan!

+1 to revert all breaking commits and thing for a better solution :-)

I can see you already reported here:
https://github.com/apache/nuttx/issues/15100

Tomek


On Mon, Dec 9, 2024 at 10:53 PM Nathan Hartman  wrote:
>
> Hi all,
>
> Unfortunately I missed the PR before it was merged, but PR-15073 has
> broken High Priority, Zero Latency Interrupts! Fortunately I caught it
> now. It was merged 17 hours ago.
>
> This PR removed a very important Kconfig: config ARMV7M_USEBASEPRI,
> and much of the logic related to it. This config is *required* when
> using CONFIG_ARCH_HIPRI_INTERRUPT.
>
> High Priority, Zero Latency Interrupts are documented here:
> https://nuttx.apache.org/docs/latest/guides/zerolatencyinterrupts.html
>
> I depend strongly on Zero Latency Interrupts. It is *required* for
> critical handling in my firmware. I cannot use NuttX at all if NuttX
> does not have Zero Latency Interrupts anymore.
>
> PR-15073 was meant to fix a regression in PR-14881. PR-14881 attempted
> to optimize handling of g_current_regs. The commit log of PR-15073
> explains this. [1]
>
> We have to find another way to fix it, without destroying the Zero
> Latency Interrupts feature. Otherwise, it is a very serious problem.
>
> I suggest to revert PR-15073, and possibly revert PR-14881, and find
> another way to implement PR-14881 that doesn't cause this problem.
>
> References:
>
> [1] PR-15073: 
> https://github.com/apache/nuttx/commit/0e1b432dd068880d353ae91c01e610f85b4a16ea#diff-24754055e0d9f2d702b8503e7f5b520394cb59affb5d24457fe0641c1d7678ceL27
>
> [2] PR-14881: https://github.com/apache/nuttx/pull/14881
>
> [3] High Priority Zero Latency Interrupts:
> https://nuttx.apache.org/docs/latest/guides/zerolatencyinterrupts.html
>
> Thanks,
> Nathan



--
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info