Re: [PATCH 0/3] watchdog: honour hw_margin_ms property

2020-06-04 Thread Stefan Roese

On 04.06.20 10:28, Rasmus Villemoes wrote:

On 02/06/2020 17.38, Rasmus Villemoes wrote:

On 02/06/2020 16.53, Stefan Roese wrote:

On 02.06.20 15:29, Rasmus Villemoes wrote:

On 16/03/2020 16.52, Rasmus Villemoes wrote:

On 14/03/2020 13.04, Stefan Roese wrote:

On 13.03.20 17:04, Rasmus Villemoes wrote:





But, as I suspected, I do have a problem when loading a compressed
kernel image - what I write above "so even in U-Boot proper, time as
measured by get_timer() ceases to pass after that point, so all the
WATCHDOG_RESET() calls from the inflate code effectively get ignored."
is indeed the case.

So, what's the best way to proceed? Should there be a hook disabling the
   rate-limiting logic that bootm_disable_interrupts() can call? Or must
get_timer() always return a sensible result even with interrupts
disabled?


Wouldn't it make sense to move the bootm_disable_interrupts() call to
after loading and uncompressing the OS image? To right before jumping
to the OS?


No, because the point of disabling interrupts is that we may start
writing to physical address 0 (e.g. if that's the load= address in the
FIT image), which is also where the interrupt vectors reside - i.e.,
we're about to overwrite 0x900 (the decrementer interrupt vector), so if
we don't disable interrupts, we'll crash on the very next decrementer
interrupt (i.e., within one millisecond).


Ah, thanks for refreshing me on this.


FWIW, the below very hacky patch makes get_timer() return sensible
results on ppc even when interrupts are disabled, and hence ensures that
the watchdog does get petted. It's obviously not meant for inclusion as
is (it's prepared for being a proper config option, but for toying
around it's easier to have it all in one file - also, I don't really
like the name of the config knob). But it's also kind of expensive to do
that do_div(), so I'm not sure I think this is even the right approach.


I agree. This does not look as its going into the right direction. But
thanks for working on this anyways.


You previously rejected allowing the board to provide an override for
the rate-limiting,


From my memory, I "just" suggested working on a different, more generic
approach. But if this fails, I'm open to re-visit the options.


and at least the "hw_margin_ms" parsing now solves
part of what I wanted to use that for. What about implementing the
rate-limiting instead in terms of get_ticks() (the hw_margin_ms can
trivially be translated to a number of ticks at init time - there's
already a usec_to_tick helper)? Are there any boards where get_ticks()
doesn't return something sensible?


Could you please send a new version of a patch(set) to address these
issues by using the "override for the rate-limiting" or some other
idea you have right now for this? I'll review it soon'ish.

Thanks,
Stefan


Rasmus

_Not_ for inclusion:

diff --git a/arch/powerpc/lib/interrupts.c b/arch/powerpc/lib/interrupts.c
index 23ac5bca1e..e6b6a967ae 100644
--- a/arch/powerpc/lib/interrupts.c
+++ b/arch/powerpc/lib/interrupts.c
@@ -10,11 +10,14 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #ifdef CONFIG_LED_STATUS
  #include 
  #endif

+#define CONFIG_GET_TIMER_IRQ 1
+
  #ifndef CONFIG_MPC83XX_TIMER
  #ifndef CONFIG_SYS_WATCHDOG_FREQ
  #define CONFIG_SYS_WATCHDOG_FREQ (CONFIG_SYS_HZ / 2)
@@ -39,8 +42,33 @@ static __inline__ void set_dec (unsigned long val)
  }
  #endif /* !CONFIG_MPC83XX_TIMER */

+static u64 irq_off_ticks = 0;
+static int interrupts_enabled = 0;
+static volatile ulong timestamp = 0;
+
+static u32 irq_off_msecs(void)
+{
+   u64 t;
+   u32 d = get_tbclk();
+
+   if (!d)
+   return 0;
+   t = get_ticks() - irq_off_ticks;
+   t *= 1000;
+   do_div(t, d);
+   return t;
+}
+
  void enable_interrupts(void)
  {
+   ulong msr = get_msr ();
+
+   if (IS_ENABLED(CONFIG_GET_TIMER_IRQ) && !(msr & MSR_EE)) {
+   /* Account for the time interrupts were off. */
+   timestamp += irq_off_msecs();
+   interrupts_enabled = 1;
+   }
+
set_msr (get_msr () | MSR_EE);
  }

@@ -50,6 +78,13 @@ int disable_interrupts(void)
ulong msr = get_msr ();

set_msr (msr & ~MSR_EE);
+
+   if (IS_ENABLED(CONFIG_GET_TIMER_IRQ) && (msr & MSR_EE)) {
+   /* Record when interrupts were disabled. */
+   irq_off_ticks = get_ticks();
+   interrupts_enabled = 0;
+   }
+
return ((msr & MSR_EE) != 0);
  }

@@ -61,13 +96,11 @@ int interrupt_init(void)

set_dec (decrementer_count);

-   set_msr (get_msr () | MSR_EE);
+   enable_interrupts();

return (0);
  }

-static volatile ulong timestamp = 0;
-
  void timer_interrupt(struct pt_regs *regs)
  {
/* call cpu specific function from $(CPU)/interrupts.c */
@@ -90,6 +123,12 @@ void timer_interrupt(struct pt_regs *regs)

  ulong get_timer (ulong base)
  {
-   return (timestamp - base);
+   ulong ts = timestamp;
+
+  

Re: [PATCH 0/3] watchdog: honour hw_margin_ms property

2020-06-04 Thread Rasmus Villemoes
On 02/06/2020 17.38, Rasmus Villemoes wrote:
> On 02/06/2020 16.53, Stefan Roese wrote:
>> On 02.06.20 15:29, Rasmus Villemoes wrote:
>>> On 16/03/2020 16.52, Rasmus Villemoes wrote:
 On 14/03/2020 13.04, Stefan Roese wrote:
> On 13.03.20 17:04, Rasmus Villemoes wrote:


>>> But, as I suspected, I do have a problem when loading a compressed
>>> kernel image - what I write above "so even in U-Boot proper, time as
>>> measured by get_timer() ceases to pass after that point, so all the
>>> WATCHDOG_RESET() calls from the inflate code effectively get ignored."
>>> is indeed the case.
>>>
>>> So, what's the best way to proceed? Should there be a hook disabling the
>>>   rate-limiting logic that bootm_disable_interrupts() can call? Or must
>>> get_timer() always return a sensible result even with interrupts
>>> disabled?
>>
>> Wouldn't it make sense to move the bootm_disable_interrupts() call to
>> after loading and uncompressing the OS image? To right before jumping
>> to the OS?
> 
> No, because the point of disabling interrupts is that we may start
> writing to physical address 0 (e.g. if that's the load= address in the
> FIT image), which is also where the interrupt vectors reside - i.e.,
> we're about to overwrite 0x900 (the decrementer interrupt vector), so if
> we don't disable interrupts, we'll crash on the very next decrementer
> interrupt (i.e., within one millisecond).

FWIW, the below very hacky patch makes get_timer() return sensible
results on ppc even when interrupts are disabled, and hence ensures that
the watchdog does get petted. It's obviously not meant for inclusion as
is (it's prepared for being a proper config option, but for toying
around it's easier to have it all in one file - also, I don't really
like the name of the config knob). But it's also kind of expensive to do
that do_div(), so I'm not sure I think this is even the right approach.

You previously rejected allowing the board to provide an override for
the rate-limiting, and at least the "hw_margin_ms" parsing now solves
part of what I wanted to use that for. What about implementing the
rate-limiting instead in terms of get_ticks() (the hw_margin_ms can
trivially be translated to a number of ticks at init time - there's
already a usec_to_tick helper)? Are there any boards where get_ticks()
doesn't return something sensible?

Rasmus

_Not_ for inclusion:

diff --git a/arch/powerpc/lib/interrupts.c b/arch/powerpc/lib/interrupts.c
index 23ac5bca1e..e6b6a967ae 100644
--- a/arch/powerpc/lib/interrupts.c
+++ b/arch/powerpc/lib/interrupts.c
@@ -10,11 +10,14 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #ifdef CONFIG_LED_STATUS
 #include 
 #endif

+#define CONFIG_GET_TIMER_IRQ 1
+
 #ifndef CONFIG_MPC83XX_TIMER
 #ifndef CONFIG_SYS_WATCHDOG_FREQ
 #define CONFIG_SYS_WATCHDOG_FREQ (CONFIG_SYS_HZ / 2)
@@ -39,8 +42,33 @@ static __inline__ void set_dec (unsigned long val)
 }
 #endif /* !CONFIG_MPC83XX_TIMER */

+static u64 irq_off_ticks = 0;
+static int interrupts_enabled = 0;
+static volatile ulong timestamp = 0;
+
+static u32 irq_off_msecs(void)
+{
+   u64 t;
+   u32 d = get_tbclk();
+
+   if (!d)
+   return 0;
+   t = get_ticks() - irq_off_ticks;
+   t *= 1000;
+   do_div(t, d);
+   return t;
+}
+
 void enable_interrupts(void)
 {
+   ulong msr = get_msr ();
+
+   if (IS_ENABLED(CONFIG_GET_TIMER_IRQ) && !(msr & MSR_EE)) {
+   /* Account for the time interrupts were off. */
+   timestamp += irq_off_msecs();
+   interrupts_enabled = 1;
+   }
+
set_msr (get_msr () | MSR_EE);
 }

@@ -50,6 +78,13 @@ int disable_interrupts(void)
ulong msr = get_msr ();

set_msr (msr & ~MSR_EE);
+
+   if (IS_ENABLED(CONFIG_GET_TIMER_IRQ) && (msr & MSR_EE)) {
+   /* Record when interrupts were disabled. */
+   irq_off_ticks = get_ticks();
+   interrupts_enabled = 0;
+   }
+
return ((msr & MSR_EE) != 0);
 }

@@ -61,13 +96,11 @@ int interrupt_init(void)

set_dec (decrementer_count);

-   set_msr (get_msr () | MSR_EE);
+   enable_interrupts();

return (0);
 }

-static volatile ulong timestamp = 0;
-
 void timer_interrupt(struct pt_regs *regs)
 {
/* call cpu specific function from $(CPU)/interrupts.c */
@@ -90,6 +123,12 @@ void timer_interrupt(struct pt_regs *regs)

 ulong get_timer (ulong base)
 {
-   return (timestamp - base);
+   ulong ts = timestamp;
+
+   /* If called within an irq-off section, account for the time since
irqs were turned off. */
+   if (IS_ENABLED(CONFIG_GET_TIMER_IRQ) && !interrupts_enabled)
+   ts += irq_off_msecs();
+
+   return (ts - base);
 }
 #endif /* !CONFIG_MPC83XX_TIMER */



-- 
Rasmus Villemoes
Software Developer
Prevas A/S
Hedeager 3
DK-8200 Aarhus N
+45 51210274
rasmus.villem...@prevas.dk
www.prevas.dk


Re: [PATCH 0/3] watchdog: honour hw_margin_ms property

2020-06-02 Thread Rasmus Villemoes
On 02/06/2020 16.53, Stefan Roese wrote:
> On 02.06.20 15:29, Rasmus Villemoes wrote:
>> On 16/03/2020 16.52, Rasmus Villemoes wrote:
>>> On 14/03/2020 13.04, Stefan Roese wrote:
 On 13.03.20 17:04, Rasmus Villemoes wrote:
>>>
> That at least solves half my problems and
> might be useful to others as well. Then I'll have to figure out the
> time-stands-still problem in some other way.

 If its too hard to enable interrupts in SPL for you or to provide some
 other means of a working get_timer() API, then we needto find another
 solution. You started with this weak function, which of course works.
 What other options are there? Adding a callback mechanism to register
 platform specific callback functions? Even though this might get a
 little bit too complicated.
>>>
>>> Now that I dig a bit more into this, I seem to remember that we actually
>>> also had problems in U-Boot proper when loading a compressed kernel, so
>>> for now we're using an uncompressed kernel in our FIT images. I will
>>> have to re-investigate, but it now occurs to me that it might be due to
>>> the fact that interrupts get disabled during bootm (which makes sense,
>>> the same reason I stated previously of interrupt vectors about to be
>>> overwritten), so even in U-Boot proper, time as measured by get_timer()
>>> ceases to pass after that point, so all the WATCHDOG_RESET() calls from
>>> the inflate code effectively get ignored.
>>>
>>> So it may be necessary to have some wdt_ratelimit_disable() hook that
>>> can be called from bootm_disable_interrupts() and e.g. some
>>> board-specific SPL code. I'll do some experiments and figure out if I do
>>> indeed need such a hook.
>>
>> OK, I have now had time to do some more experiments. I have enabled the
>> timer tick in SPL, so get_timer() now "normally" works. Together with
>> the .dts based read of the hardware margin, that makes the watchdog
>> handling mostly work.
>>
>> But, as I suspected, I do have a problem when loading a compressed
>> kernel image - what I write above "so even in U-Boot proper, time as
>> measured by get_timer() ceases to pass after that point, so all the
>> WATCHDOG_RESET() calls from the inflate code effectively get ignored."
>> is indeed the case.
>>
>> So, what's the best way to proceed? Should there be a hook disabling the
>>   rate-limiting logic that bootm_disable_interrupts() can call? Or must
>> get_timer() always return a sensible result even with interrupts
>> disabled?
> 
> Wouldn't it make sense to move the bootm_disable_interrupts() call to
> after loading and uncompressing the OS image? To right before jumping
> to the OS?

No, because the point of disabling interrupts is that we may start
writing to physical address 0 (e.g. if that's the load= address in the
FIT image), which is also where the interrupt vectors reside - i.e.,
we're about to overwrite 0x900 (the decrementer interrupt vector), so if
we don't disable interrupts, we'll crash on the very next decrementer
interrupt (i.e., within one millisecond).

Rasmus


Re: [PATCH 0/3] watchdog: honour hw_margin_ms property

2020-06-02 Thread Stefan Roese

On 02.06.20 15:29, Rasmus Villemoes wrote:

On 16/03/2020 16.52, Rasmus Villemoes wrote:

On 14/03/2020 13.04, Stefan Roese wrote:

On 13.03.20 17:04, Rasmus Villemoes wrote:



That at least solves half my problems and
might be useful to others as well. Then I'll have to figure out the
time-stands-still problem in some other way.


If its too hard to enable interrupts in SPL for you or to provide some
other means of a working get_timer() API, then we needto find another
solution. You started with this weak function, which of course works.
What other options are there? Adding a callback mechanism to register
platform specific callback functions? Even though this might get a
little bit too complicated.


Now that I dig a bit more into this, I seem to remember that we actually
also had problems in U-Boot proper when loading a compressed kernel, so
for now we're using an uncompressed kernel in our FIT images. I will
have to re-investigate, but it now occurs to me that it might be due to
the fact that interrupts get disabled during bootm (which makes sense,
the same reason I stated previously of interrupt vectors about to be
overwritten), so even in U-Boot proper, time as measured by get_timer()
ceases to pass after that point, so all the WATCHDOG_RESET() calls from
the inflate code effectively get ignored.

So it may be necessary to have some wdt_ratelimit_disable() hook that
can be called from bootm_disable_interrupts() and e.g. some
board-specific SPL code. I'll do some experiments and figure out if I do
indeed need such a hook.


OK, I have now had time to do some more experiments. I have enabled the
timer tick in SPL, so get_timer() now "normally" works. Together with
the .dts based read of the hardware margin, that makes the watchdog
handling mostly work.

But, as I suspected, I do have a problem when loading a compressed
kernel image - what I write above "so even in U-Boot proper, time as
measured by get_timer() ceases to pass after that point, so all the
WATCHDOG_RESET() calls from the inflate code effectively get ignored."
is indeed the case.

So, what's the best way to proceed? Should there be a hook disabling the
  rate-limiting logic that bootm_disable_interrupts() can call? Or must
get_timer() always return a sensible result even with interrupts disabled?


Wouldn't it make sense to move the bootm_disable_interrupts() call to
after loading and uncompressing the OS image? To right before jumping
to the OS?

Thanks,
Stefan


Re: [PATCH 0/3] watchdog: honour hw_margin_ms property

2020-06-02 Thread Rasmus Villemoes
On 16/03/2020 16.52, Rasmus Villemoes wrote:
> On 14/03/2020 13.04, Stefan Roese wrote:
>> On 13.03.20 17:04, Rasmus Villemoes wrote:
> 
>>> That at least solves half my problems and
>>> might be useful to others as well. Then I'll have to figure out the
>>> time-stands-still problem in some other way.
>>
>> If its too hard to enable interrupts in SPL for you or to provide some
>> other means of a working get_timer() API, then we needto find another
>> solution. You started with this weak function, which of course works.
>> What other options are there? Adding a callback mechanism to register
>> platform specific callback functions? Even though this might get a
>> little bit too complicated.
> 
> Now that I dig a bit more into this, I seem to remember that we actually
> also had problems in U-Boot proper when loading a compressed kernel, so
> for now we're using an uncompressed kernel in our FIT images. I will
> have to re-investigate, but it now occurs to me that it might be due to
> the fact that interrupts get disabled during bootm (which makes sense,
> the same reason I stated previously of interrupt vectors about to be
> overwritten), so even in U-Boot proper, time as measured by get_timer()
> ceases to pass after that point, so all the WATCHDOG_RESET() calls from
> the inflate code effectively get ignored.
> 
> So it may be necessary to have some wdt_ratelimit_disable() hook that
> can be called from bootm_disable_interrupts() and e.g. some
> board-specific SPL code. I'll do some experiments and figure out if I do
> indeed need such a hook.

OK, I have now had time to do some more experiments. I have enabled the
timer tick in SPL, so get_timer() now "normally" works. Together with
the .dts based read of the hardware margin, that makes the watchdog
handling mostly work.

But, as I suspected, I do have a problem when loading a compressed
kernel image - what I write above "so even in U-Boot proper, time as
measured by get_timer() ceases to pass after that point, so all the
WATCHDOG_RESET() calls from the inflate code effectively get ignored."
is indeed the case.

So, what's the best way to proceed? Should there be a hook disabling the
 rate-limiting logic that bootm_disable_interrupts() can call? Or must
get_timer() always return a sensible result even with interrupts disabled?

Rasmus


Re: [PATCH 0/3] watchdog: honour hw_margin_ms property

2020-04-15 Thread Stefan Roese

On 13.03.20 17:04, Rasmus Villemoes wrote:

Some watchdogs must be reset more often than the once-per-second
ratelimit used by the generic watchdog_reset function in
wdt-uclass.c. There's precedent (from the gpio-wdt driver in linux)
for using a property called hw_margin_ms to let the device tree tell
the driver how often the device needs resetting. So use that
generically. No change in default behaviour.

On top of https://patchwork.ozlabs.org/patch/1242772/ .

Stefan, something like this? That at least solves half my problems and
might be useful to others as well. Then I'll have to figure out the
time-stands-still problem in some other way.

Rasmus Villemoes (3):
   watchdog: remove stale ifndef CONFIG_WATCHDOG_TIMEOUT_MSECS from wdt.h
   watchdog: move initr_watchdog() to wdt-uclass.c
   watchdog: honour hw_margin_ms DT property

  drivers/watchdog/wdt-uclass.c | 43 ++-
  include/wdt.h | 38 +--
  2 files changed, 43 insertions(+), 38 deletions(-)



Series applied to u-boot-marvell/master

Thanks,
Stefan


Re: [PATCH 0/3] watchdog: honour hw_margin_ms property

2020-03-16 Thread Rasmus Villemoes
On 14/03/2020 13.04, Stefan Roese wrote:
> On 13.03.20 17:04, Rasmus Villemoes wrote:

>> That at least solves half my problems and
>> might be useful to others as well. Then I'll have to figure out the
>> time-stands-still problem in some other way.
> 
> If its too hard to enable interrupts in SPL for you or to provide some
> other means of a working get_timer() API, then we needto find another
> solution. You started with this weak function, which of course works.
> What other options are there? Adding a callback mechanism to register
> platform specific callback functions? Even though this might get a
> little bit too complicated.

Now that I dig a bit more into this, I seem to remember that we actually
also had problems in U-Boot proper when loading a compressed kernel, so
for now we're using an uncompressed kernel in our FIT images. I will
have to re-investigate, but it now occurs to me that it might be due to
the fact that interrupts get disabled during bootm (which makes sense,
the same reason I stated previously of interrupt vectors about to be
overwritten), so even in U-Boot proper, time as measured by get_timer()
ceases to pass after that point, so all the WATCHDOG_RESET() calls from
the inflate code effectively get ignored.

So it may be necessary to have some wdt_ratelimit_disable() hook that
can be called from bootm_disable_interrupts() and e.g. some
board-specific SPL code. I'll do some experiments and figure out if I do
indeed need such a hook.

Rasmus


Re: [PATCH 0/3] watchdog: honour hw_margin_ms property

2020-03-14 Thread Stefan Roese

On 13.03.20 17:04, Rasmus Villemoes wrote:

Some watchdogs must be reset more often than the once-per-second
ratelimit used by the generic watchdog_reset function in
wdt-uclass.c. There's precedent (from the gpio-wdt driver in linux)
for using a property called hw_margin_ms to let the device tree tell
the driver how often the device needs resetting. So use that
generically. No change in default behaviour.

On top of https://patchwork.ozlabs.org/patch/1242772/ .

Stefan, something like this?


Yes, thanks for looking into this.


That at least solves half my problems and
might be useful to others as well. Then I'll have to figure out the
time-stands-still problem in some other way.


If its too hard to enable interrupts in SPL for you or to provide some
other means of a working get_timer() API, then we needto find another
solution. You started with this weak function, which of course works.
What other options are there? Adding a callback mechanism to register
platform specific callback functions? Even though this might get a
little bit too complicated.

Thanks,
Stefan