Re: [PATCH v2 2/8] watchdog: Introduce hardware maximum timeout in watchdog core

2015-08-15 Thread Guenter Roeck

On 08/14/2015 04:23 AM, Uwe Kleine-König wrote:

Hello Guenter,

On Fri, Aug 07, 2015 at 10:02:41PM -0700, Guenter Roeck wrote:

[...]
@@ -61,26 +135,27 @@ static struct watchdog_device *old_wdd;

  static int watchdog_ping(struct watchdog_device *wdd)
  {
-   int err = 0;
+   int err;

mutex_lock(>lock);
+   err = _watchdog_ping(wdd);
+   wdd->last_keepalive = jiffies;
+   watchdog_update_worker(wdd, false, false);
+   mutex_unlock(>lock);

-   if (test_bit(WDOG_UNREGISTERED, >status)) {
-   err = -ENODEV;
-   goto out_ping;
-   }
+   return err;
+}

-   if (!watchdog_active(wdd))
-   goto out_ping;
+static void watchdog_ping_work(struct work_struct *work)
+{
+   struct watchdog_device *wdd;

-   if (wdd->ops->ping)
-   err = wdd->ops->ping(wdd);/* ping the watchdog */
-   else
-   err = wdd->ops->start(wdd);   /* restart watchdog */
+   wdd = container_of(to_delayed_work(work), struct watchdog_device, work);

-out_ping:
+   mutex_lock(>lock);
+   _watchdog_ping(wdd);
+   watchdog_update_worker(wdd, false, false);
mutex_unlock(>lock);
-   return err;
  }

  /*
@@ -107,8 +182,11 @@ static int watchdog_start(struct watchdog_device *wdd)
goto out_start;

err = wdd->ops->start(wdd);
-   if (err == 0)
+   if (err == 0) {
set_bit(WDOG_ACTIVE, >status);
+   wdd->last_keepalive = jiffies;
+   watchdog_update_worker(wdd, false, false);
+   }

  out_start:
mutex_unlock(>lock);
@@ -146,8 +224,10 @@ static int watchdog_stop(struct watchdog_device *wdd)
}

err = wdd->ops->stop(wdd);
-   if (err == 0)
+   if (err == 0) {
clear_bit(WDOG_ACTIVE, >status);
+   watchdog_update_worker(wdd, true, false);
+   }

  out_stop:
mutex_unlock(>lock);
@@ -211,6 +291,8 @@ static int watchdog_set_timeout(struct watchdog_device *wdd,

err = wdd->ops->set_timeout(wdd, timeout);

+   watchdog_update_worker(wdd, true, false);


I still try to wrap my head around this function. You pass cancel=true
for stop and set_timeout to ensure that the worker doesn't continue to
run. That's fine.

For watchdog_start you pass cancel=false. I guess the background is that
after one of the next patches the worker might already run for handling
the watchdog being unstoppable. Maybe it's easier to grasp the logic if
you don't try to be too clever here: stop the worker on start
unconditionally and possibly restart it if the hardware needs extra
poking to fulfil the timeout set?


I thought it would reduce the amount of code, and I thought it would be
more confusing and complicated to first call cancel the worker followed
by a (conditional) start. No strong opinion, though; I'll be happy to
make that change in exchange for a Reviewed-by:.



+   if (!watchdog_wq)
+   return -ENODEV;
+
+   INIT_DELAYED_WORK(>work, watchdog_ping_work);
+
+   if (!wdd->max_hw_timeout_ms)
+   wdd->max_hw_timeout_ms = wdd->max_timeout * 1000;


With this (and assuming wdd->max_timeout > 0) the check for
max_hw_timeout_ms != 0 is not necessary, is it?


With the logical change I am making, to ignore max_timeout if max_hw_timeout_ms
is configured, it is indeed no longer necessary (nor desirable).


+
if (wdd->id == 0) {
old_wdd = wdd;
watchdog_miscdev.parent = wdd->parent;
[...]
@@ -585,9 +680,21 @@ int watchdog_dev_unregister(struct watchdog_device *wdd)

  int __init watchdog_dev_init(void)
  {
-   int err = alloc_chrdev_region(_devt, 0, MAX_DOGS, "watchdog");
+   int err;
+
+   watchdog_wq = alloc_workqueue("watchdogd",
+ WQ_HIGHPRI | WQ_MEM_RECLAIM, 0);
+   if (!watchdog_wq) {
+   pr_err("Failed to create watchdog workqueue\n");
+   err = -ENOMEM;
+   goto abort;


Why not return -ENOMEM directly?


No idea. Changed.

Thanks,
Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/8] watchdog: Introduce hardware maximum timeout in watchdog core

2015-08-15 Thread Guenter Roeck

Hi Uwe,

On 08/13/2015 02:13 PM, Uwe Kleine-König wrote:

Hello Guenter,

On Fri, Aug 07, 2015 at 10:02:41PM -0700, Guenter Roeck wrote:

Introduce an optional hardware maximum timeout in the watchdog core.
The hardware maximum timeout can be lower than the maximum timeout.

Drivers can set the maximum hardware timeout value in the watchdog data
structure. If the configured timeout exceeds the maximum hardware timeout,
the watchdog core enables a timer function to assist sending keepalive
requests to the watchdog driver.

Cc: Timo Kokkonen 
Cc: Uwe Kleine-König 
Signed-off-by: Guenter Roeck 
---
v2:
- Improved and hopefully clarified documentation.
- Rearranged variables in struct watchdog_device such that internal variables
   come last.
- The code now ensures that the watchdog times out  seconds after
   the most recent keepalive sent from user space.
- The internal keepalive now stops silently and no longer generates a
   warning message. Reason is that it will now stop early, while there
   may still be a substantial amount of time for keepalives from user space
   to arrive. If such keepalives arrive late (for example if user space
   is configured to send keepalives just a few seconds before the watchdog
   times out), the message would just be noise and not provide any value.
---
  Documentation/watchdog/watchdog-kernel-api.txt |  23 +++-
  drivers/watchdog/watchdog_dev.c| 140 ++---
  include/linux/watchdog.h   |  26 +++--
  3 files changed, 163 insertions(+), 26 deletions(-)

diff --git a/Documentation/watchdog/watchdog-kernel-api.txt 
b/Documentation/watchdog/watchdog-kernel-api.txt
index d8b0d3367706..25b00b878a7b 100644
--- a/Documentation/watchdog/watchdog-kernel-api.txt
+++ b/Documentation/watchdog/watchdog-kernel-api.txt
@@ -53,9 +53,12 @@ struct watchdog_device {
unsigned int timeout;
unsigned int min_timeout;
unsigned int max_timeout;
+   unsigned int max_hw_timeout_ms;
void *driver_data;
-   struct mutex lock;
unsigned long status;
+   struct mutex lock;
+   unsigned long last_keepalive;
+   struct delayed_work work;
struct list_head deferred;
  };

@@ -73,18 +76,28 @@ It contains following fields:
additional information about the watchdog timer itself. (Like it's unique 
name)
  * ops: a pointer to the list of watchdog operations that the watchdog 
supports.
  * timeout: the watchdog timer's timeout value (in seconds).
+  This is the time after which the system will reboot if user space does
+  not send a heartbeat request if WDOG_ACTIVE is set.
  * min_timeout: the watchdog timer's minimum timeout value (in seconds).
  * max_timeout: the watchdog timer's maximum timeout value (in seconds).
+* max_hw_timeout_ms: Maximum hardware timeout, in milli-seconds. May differ
+  from max_timeout. If set to a value larger than max_timeout, the
+  infrastructure will send a heartbeat to the watchdog driver if 'timeout'
+  is larger than 'max_hw_timeout / 2', unless WDOG_ACTIVE is set and user
+  space failed to send a heartbeat for at least 'timeout' seconds.

To properly understand this it would be necessary to know what
max_timeout is. Maybe clearify that, too?



I think I found a better solution: Declare that max_timeout won't be used
if max_hw_timeout_ms is provided. This also simplifies the code a lot.


  * bootstatus: status of the device after booting (reported with watchdog
WDIOF_* status bits).
  * driver_data: a pointer to the drivers private data of a watchdog device.
This data should only be accessed via the watchdog_set_drvdata and
watchdog_get_drvdata routines.
-* lock: Mutex for WatchDog Timer Driver Core internal use only.
  * status: this field contains a number of status bits that give extra
information about the status of the device (Like: is the watchdog timer
running/active, is the nowayout bit set, is the device opened via
the /dev/watchdog interface or not, ...).
+* lock: Mutex for WatchDog Timer Driver Core internal use only.
+* last_keepalive: Time of most recent keepalive triggered from user space,
+  in jiffies.
+* work: Worker data structure for WatchDog Timer Driver Core internal use only.
  * deferred: entry in wtd_deferred_reg_list which is used to
register early initialized watchdogs.

@@ -160,7 +173,11 @@ they are supported. These optional routines/operations are:
and -EIO for "could not write value to the watchdog". On success this
routine should set the timeout value of the watchdog_device to the
achieved timeout value (which may be different from the requested one
-  because the watchdog does not necessarily has a 1 second resolution).
+  because the watchdog does not necessarily have a 1 second resolution).
+  Drivers implementing hw_max_timeout_ms set the hardware watchdog timeout
+  to the minimum of timeout and hw_max_timeout_ms. Those drivers set the
+  timeout value of the watchdog_device 

Re: [PATCH v2 2/8] watchdog: Introduce hardware maximum timeout in watchdog core

2015-08-15 Thread Guenter Roeck

On 08/14/2015 04:23 AM, Uwe Kleine-König wrote:

Hello Guenter,

On Fri, Aug 07, 2015 at 10:02:41PM -0700, Guenter Roeck wrote:

[...]
@@ -61,26 +135,27 @@ static struct watchdog_device *old_wdd;

  static int watchdog_ping(struct watchdog_device *wdd)
  {
-   int err = 0;
+   int err;

mutex_lock(wdd-lock);
+   err = _watchdog_ping(wdd);
+   wdd-last_keepalive = jiffies;
+   watchdog_update_worker(wdd, false, false);
+   mutex_unlock(wdd-lock);

-   if (test_bit(WDOG_UNREGISTERED, wdd-status)) {
-   err = -ENODEV;
-   goto out_ping;
-   }
+   return err;
+}

-   if (!watchdog_active(wdd))
-   goto out_ping;
+static void watchdog_ping_work(struct work_struct *work)
+{
+   struct watchdog_device *wdd;

-   if (wdd-ops-ping)
-   err = wdd-ops-ping(wdd);/* ping the watchdog */
-   else
-   err = wdd-ops-start(wdd);   /* restart watchdog */
+   wdd = container_of(to_delayed_work(work), struct watchdog_device, work);

-out_ping:
+   mutex_lock(wdd-lock);
+   _watchdog_ping(wdd);
+   watchdog_update_worker(wdd, false, false);
mutex_unlock(wdd-lock);
-   return err;
  }

  /*
@@ -107,8 +182,11 @@ static int watchdog_start(struct watchdog_device *wdd)
goto out_start;

err = wdd-ops-start(wdd);
-   if (err == 0)
+   if (err == 0) {
set_bit(WDOG_ACTIVE, wdd-status);
+   wdd-last_keepalive = jiffies;
+   watchdog_update_worker(wdd, false, false);
+   }

  out_start:
mutex_unlock(wdd-lock);
@@ -146,8 +224,10 @@ static int watchdog_stop(struct watchdog_device *wdd)
}

err = wdd-ops-stop(wdd);
-   if (err == 0)
+   if (err == 0) {
clear_bit(WDOG_ACTIVE, wdd-status);
+   watchdog_update_worker(wdd, true, false);
+   }

  out_stop:
mutex_unlock(wdd-lock);
@@ -211,6 +291,8 @@ static int watchdog_set_timeout(struct watchdog_device *wdd,

err = wdd-ops-set_timeout(wdd, timeout);

+   watchdog_update_worker(wdd, true, false);


I still try to wrap my head around this function. You pass cancel=true
for stop and set_timeout to ensure that the worker doesn't continue to
run. That's fine.

For watchdog_start you pass cancel=false. I guess the background is that
after one of the next patches the worker might already run for handling
the watchdog being unstoppable. Maybe it's easier to grasp the logic if
you don't try to be too clever here: stop the worker on start
unconditionally and possibly restart it if the hardware needs extra
poking to fulfil the timeout set?


I thought it would reduce the amount of code, and I thought it would be
more confusing and complicated to first call cancel the worker followed
by a (conditional) start. No strong opinion, though; I'll be happy to
make that change in exchange for a Reviewed-by:.



+   if (!watchdog_wq)
+   return -ENODEV;
+
+   INIT_DELAYED_WORK(wdd-work, watchdog_ping_work);
+
+   if (!wdd-max_hw_timeout_ms)
+   wdd-max_hw_timeout_ms = wdd-max_timeout * 1000;


With this (and assuming wdd-max_timeout  0) the check for
max_hw_timeout_ms != 0 is not necessary, is it?


With the logical change I am making, to ignore max_timeout if max_hw_timeout_ms
is configured, it is indeed no longer necessary (nor desirable).


+
if (wdd-id == 0) {
old_wdd = wdd;
watchdog_miscdev.parent = wdd-parent;
[...]
@@ -585,9 +680,21 @@ int watchdog_dev_unregister(struct watchdog_device *wdd)

  int __init watchdog_dev_init(void)
  {
-   int err = alloc_chrdev_region(watchdog_devt, 0, MAX_DOGS, watchdog);
+   int err;
+
+   watchdog_wq = alloc_workqueue(watchdogd,
+ WQ_HIGHPRI | WQ_MEM_RECLAIM, 0);
+   if (!watchdog_wq) {
+   pr_err(Failed to create watchdog workqueue\n);
+   err = -ENOMEM;
+   goto abort;


Why not return -ENOMEM directly?


No idea. Changed.

Thanks,
Guenter

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/8] watchdog: Introduce hardware maximum timeout in watchdog core

2015-08-15 Thread Guenter Roeck

Hi Uwe,

On 08/13/2015 02:13 PM, Uwe Kleine-König wrote:

Hello Guenter,

On Fri, Aug 07, 2015 at 10:02:41PM -0700, Guenter Roeck wrote:

Introduce an optional hardware maximum timeout in the watchdog core.
The hardware maximum timeout can be lower than the maximum timeout.

Drivers can set the maximum hardware timeout value in the watchdog data
structure. If the configured timeout exceeds the maximum hardware timeout,
the watchdog core enables a timer function to assist sending keepalive
requests to the watchdog driver.

Cc: Timo Kokkonen timo.kokko...@offcode.fi
Cc: Uwe Kleine-König u.kleine-koe...@pengutronix.de
Signed-off-by: Guenter Roeck li...@roeck-us.net
---
v2:
- Improved and hopefully clarified documentation.
- Rearranged variables in struct watchdog_device such that internal variables
   come last.
- The code now ensures that the watchdog times out timeout seconds after
   the most recent keepalive sent from user space.
- The internal keepalive now stops silently and no longer generates a
   warning message. Reason is that it will now stop early, while there
   may still be a substantial amount of time for keepalives from user space
   to arrive. If such keepalives arrive late (for example if user space
   is configured to send keepalives just a few seconds before the watchdog
   times out), the message would just be noise and not provide any value.
---
  Documentation/watchdog/watchdog-kernel-api.txt |  23 +++-
  drivers/watchdog/watchdog_dev.c| 140 ++---
  include/linux/watchdog.h   |  26 +++--
  3 files changed, 163 insertions(+), 26 deletions(-)

diff --git a/Documentation/watchdog/watchdog-kernel-api.txt 
b/Documentation/watchdog/watchdog-kernel-api.txt
index d8b0d3367706..25b00b878a7b 100644
--- a/Documentation/watchdog/watchdog-kernel-api.txt
+++ b/Documentation/watchdog/watchdog-kernel-api.txt
@@ -53,9 +53,12 @@ struct watchdog_device {
unsigned int timeout;
unsigned int min_timeout;
unsigned int max_timeout;
+   unsigned int max_hw_timeout_ms;
void *driver_data;
-   struct mutex lock;
unsigned long status;
+   struct mutex lock;
+   unsigned long last_keepalive;
+   struct delayed_work work;
struct list_head deferred;
  };

@@ -73,18 +76,28 @@ It contains following fields:
additional information about the watchdog timer itself. (Like it's unique 
name)
  * ops: a pointer to the list of watchdog operations that the watchdog 
supports.
  * timeout: the watchdog timer's timeout value (in seconds).
+  This is the time after which the system will reboot if user space does
+  not send a heartbeat request if WDOG_ACTIVE is set.
  * min_timeout: the watchdog timer's minimum timeout value (in seconds).
  * max_timeout: the watchdog timer's maximum timeout value (in seconds).
+* max_hw_timeout_ms: Maximum hardware timeout, in milli-seconds. May differ
+  from max_timeout. If set to a value larger than max_timeout, the
+  infrastructure will send a heartbeat to the watchdog driver if 'timeout'
+  is larger than 'max_hw_timeout / 2', unless WDOG_ACTIVE is set and user
+  space failed to send a heartbeat for at least 'timeout' seconds.

To properly understand this it would be necessary to know what
max_timeout is. Maybe clearify that, too?



I think I found a better solution: Declare that max_timeout won't be used
if max_hw_timeout_ms is provided. This also simplifies the code a lot.


  * bootstatus: status of the device after booting (reported with watchdog
WDIOF_* status bits).
  * driver_data: a pointer to the drivers private data of a watchdog device.
This data should only be accessed via the watchdog_set_drvdata and
watchdog_get_drvdata routines.
-* lock: Mutex for WatchDog Timer Driver Core internal use only.
  * status: this field contains a number of status bits that give extra
information about the status of the device (Like: is the watchdog timer
running/active, is the nowayout bit set, is the device opened via
the /dev/watchdog interface or not, ...).
+* lock: Mutex for WatchDog Timer Driver Core internal use only.
+* last_keepalive: Time of most recent keepalive triggered from user space,
+  in jiffies.
+* work: Worker data structure for WatchDog Timer Driver Core internal use only.
  * deferred: entry in wtd_deferred_reg_list which is used to
register early initialized watchdogs.

@@ -160,7 +173,11 @@ they are supported. These optional routines/operations are:
and -EIO for could not write value to the watchdog. On success this
routine should set the timeout value of the watchdog_device to the
achieved timeout value (which may be different from the requested one
-  because the watchdog does not necessarily has a 1 second resolution).
+  because the watchdog does not necessarily have a 1 second resolution).
+  Drivers implementing hw_max_timeout_ms set the hardware watchdog timeout
+  to the minimum of timeout and 

Re: [PATCH v2 2/8] watchdog: Introduce hardware maximum timeout in watchdog core

2015-08-14 Thread Uwe Kleine-König
Hello Guenter,

On Fri, Aug 07, 2015 at 10:02:41PM -0700, Guenter Roeck wrote:
> [...]
> @@ -61,26 +135,27 @@ static struct watchdog_device *old_wdd;
>  
>  static int watchdog_ping(struct watchdog_device *wdd)
>  {
> - int err = 0;
> + int err;
>  
>   mutex_lock(>lock);
> + err = _watchdog_ping(wdd);
> + wdd->last_keepalive = jiffies;
> + watchdog_update_worker(wdd, false, false);
> + mutex_unlock(>lock);
>  
> - if (test_bit(WDOG_UNREGISTERED, >status)) {
> - err = -ENODEV;
> - goto out_ping;
> - }
> + return err;
> +}
>  
> - if (!watchdog_active(wdd))
> - goto out_ping;
> +static void watchdog_ping_work(struct work_struct *work)
> +{
> + struct watchdog_device *wdd;
>  
> - if (wdd->ops->ping)
> - err = wdd->ops->ping(wdd);  /* ping the watchdog */
> - else
> - err = wdd->ops->start(wdd); /* restart watchdog */
> + wdd = container_of(to_delayed_work(work), struct watchdog_device, work);
>  
> -out_ping:
> + mutex_lock(>lock);
> + _watchdog_ping(wdd);
> + watchdog_update_worker(wdd, false, false);
>   mutex_unlock(>lock);
> - return err;
>  }
>  
>  /*
> @@ -107,8 +182,11 @@ static int watchdog_start(struct watchdog_device *wdd)
>   goto out_start;
>  
>   err = wdd->ops->start(wdd);
> - if (err == 0)
> + if (err == 0) {
>   set_bit(WDOG_ACTIVE, >status);
> + wdd->last_keepalive = jiffies;
> + watchdog_update_worker(wdd, false, false);
> + }
>  
>  out_start:
>   mutex_unlock(>lock);
> @@ -146,8 +224,10 @@ static int watchdog_stop(struct watchdog_device *wdd)
>   }
>  
>   err = wdd->ops->stop(wdd);
> - if (err == 0)
> + if (err == 0) {
>   clear_bit(WDOG_ACTIVE, >status);
> + watchdog_update_worker(wdd, true, false);
> + }
>  
>  out_stop:
>   mutex_unlock(>lock);
> @@ -211,6 +291,8 @@ static int watchdog_set_timeout(struct watchdog_device 
> *wdd,
>  
>   err = wdd->ops->set_timeout(wdd, timeout);
>  
> + watchdog_update_worker(wdd, true, false);

I still try to wrap my head around this function. You pass cancel=true
for stop and set_timeout to ensure that the worker doesn't continue to
run. That's fine.

For watchdog_start you pass cancel=false. I guess the background is that
after one of the next patches the worker might already run for handling
the watchdog being unstoppable. Maybe it's easier to grasp the logic if
you don't try to be too clever here: stop the worker on start
unconditionally and possibly restart it if the hardware needs extra
poking to fulfil the timeout set?

> + if (!watchdog_wq)
> + return -ENODEV;
> +
> + INIT_DELAYED_WORK(>work, watchdog_ping_work);
> +
> + if (!wdd->max_hw_timeout_ms)
> + wdd->max_hw_timeout_ms = wdd->max_timeout * 1000;

With this (and assuming wdd->max_timeout > 0) the check for
max_hw_timeout_ms != 0 is not necessary, is it?

> +
>   if (wdd->id == 0) {
>   old_wdd = wdd;
>   watchdog_miscdev.parent = wdd->parent;
> [...]
> @@ -585,9 +680,21 @@ int watchdog_dev_unregister(struct watchdog_device *wdd)
>  
>  int __init watchdog_dev_init(void)
>  {
> - int err = alloc_chrdev_region(_devt, 0, MAX_DOGS, "watchdog");
> + int err;
> +
> + watchdog_wq = alloc_workqueue("watchdogd",
> +   WQ_HIGHPRI | WQ_MEM_RECLAIM, 0);
> + if (!watchdog_wq) {
> + pr_err("Failed to create watchdog workqueue\n");
> + err = -ENOMEM;
> + goto abort;

Why not return -ENOMEM directly?

> + }
> +
> + err = alloc_chrdev_region(_devt, 0, MAX_DOGS, "watchdog");
>   if (err < 0)
>   pr_err("watchdog: unable to allocate char dev region\n");
> +
> +abort:
>   return err;

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/8] watchdog: Introduce hardware maximum timeout in watchdog core

2015-08-14 Thread Uwe Kleine-König
Hello Guenter,

On Fri, Aug 07, 2015 at 10:02:41PM -0700, Guenter Roeck wrote:
 [...]
 @@ -61,26 +135,27 @@ static struct watchdog_device *old_wdd;
  
  static int watchdog_ping(struct watchdog_device *wdd)
  {
 - int err = 0;
 + int err;
  
   mutex_lock(wdd-lock);
 + err = _watchdog_ping(wdd);
 + wdd-last_keepalive = jiffies;
 + watchdog_update_worker(wdd, false, false);
 + mutex_unlock(wdd-lock);
  
 - if (test_bit(WDOG_UNREGISTERED, wdd-status)) {
 - err = -ENODEV;
 - goto out_ping;
 - }
 + return err;
 +}
  
 - if (!watchdog_active(wdd))
 - goto out_ping;
 +static void watchdog_ping_work(struct work_struct *work)
 +{
 + struct watchdog_device *wdd;
  
 - if (wdd-ops-ping)
 - err = wdd-ops-ping(wdd);  /* ping the watchdog */
 - else
 - err = wdd-ops-start(wdd); /* restart watchdog */
 + wdd = container_of(to_delayed_work(work), struct watchdog_device, work);
  
 -out_ping:
 + mutex_lock(wdd-lock);
 + _watchdog_ping(wdd);
 + watchdog_update_worker(wdd, false, false);
   mutex_unlock(wdd-lock);
 - return err;
  }
  
  /*
 @@ -107,8 +182,11 @@ static int watchdog_start(struct watchdog_device *wdd)
   goto out_start;
  
   err = wdd-ops-start(wdd);
 - if (err == 0)
 + if (err == 0) {
   set_bit(WDOG_ACTIVE, wdd-status);
 + wdd-last_keepalive = jiffies;
 + watchdog_update_worker(wdd, false, false);
 + }
  
  out_start:
   mutex_unlock(wdd-lock);
 @@ -146,8 +224,10 @@ static int watchdog_stop(struct watchdog_device *wdd)
   }
  
   err = wdd-ops-stop(wdd);
 - if (err == 0)
 + if (err == 0) {
   clear_bit(WDOG_ACTIVE, wdd-status);
 + watchdog_update_worker(wdd, true, false);
 + }
  
  out_stop:
   mutex_unlock(wdd-lock);
 @@ -211,6 +291,8 @@ static int watchdog_set_timeout(struct watchdog_device 
 *wdd,
  
   err = wdd-ops-set_timeout(wdd, timeout);
  
 + watchdog_update_worker(wdd, true, false);

I still try to wrap my head around this function. You pass cancel=true
for stop and set_timeout to ensure that the worker doesn't continue to
run. That's fine.

For watchdog_start you pass cancel=false. I guess the background is that
after one of the next patches the worker might already run for handling
the watchdog being unstoppable. Maybe it's easier to grasp the logic if
you don't try to be too clever here: stop the worker on start
unconditionally and possibly restart it if the hardware needs extra
poking to fulfil the timeout set?

 + if (!watchdog_wq)
 + return -ENODEV;
 +
 + INIT_DELAYED_WORK(wdd-work, watchdog_ping_work);
 +
 + if (!wdd-max_hw_timeout_ms)
 + wdd-max_hw_timeout_ms = wdd-max_timeout * 1000;

With this (and assuming wdd-max_timeout  0) the check for
max_hw_timeout_ms != 0 is not necessary, is it?

 +
   if (wdd-id == 0) {
   old_wdd = wdd;
   watchdog_miscdev.parent = wdd-parent;
 [...]
 @@ -585,9 +680,21 @@ int watchdog_dev_unregister(struct watchdog_device *wdd)
  
  int __init watchdog_dev_init(void)
  {
 - int err = alloc_chrdev_region(watchdog_devt, 0, MAX_DOGS, watchdog);
 + int err;
 +
 + watchdog_wq = alloc_workqueue(watchdogd,
 +   WQ_HIGHPRI | WQ_MEM_RECLAIM, 0);
 + if (!watchdog_wq) {
 + pr_err(Failed to create watchdog workqueue\n);
 + err = -ENOMEM;
 + goto abort;

Why not return -ENOMEM directly?

 + }
 +
 + err = alloc_chrdev_region(watchdog_devt, 0, MAX_DOGS, watchdog);
   if (err  0)
   pr_err(watchdog: unable to allocate char dev region\n);
 +
 +abort:
   return err;

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/8] watchdog: Introduce hardware maximum timeout in watchdog core

2015-08-13 Thread Uwe Kleine-König
Hello Guenter,

On Fri, Aug 07, 2015 at 10:02:41PM -0700, Guenter Roeck wrote:
> Introduce an optional hardware maximum timeout in the watchdog core.
> The hardware maximum timeout can be lower than the maximum timeout.
> 
> Drivers can set the maximum hardware timeout value in the watchdog data
> structure. If the configured timeout exceeds the maximum hardware timeout,
> the watchdog core enables a timer function to assist sending keepalive
> requests to the watchdog driver.
> 
> Cc: Timo Kokkonen 
> Cc: Uwe Kleine-König 
> Signed-off-by: Guenter Roeck 
> ---
> v2:
> - Improved and hopefully clarified documentation.
> - Rearranged variables in struct watchdog_device such that internal variables
>   come last.
> - The code now ensures that the watchdog times out  seconds after
>   the most recent keepalive sent from user space.
> - The internal keepalive now stops silently and no longer generates a
>   warning message. Reason is that it will now stop early, while there
>   may still be a substantial amount of time for keepalives from user space
>   to arrive. If such keepalives arrive late (for example if user space
>   is configured to send keepalives just a few seconds before the watchdog
>   times out), the message would just be noise and not provide any value.
> ---
>  Documentation/watchdog/watchdog-kernel-api.txt |  23 +++-
>  drivers/watchdog/watchdog_dev.c| 140 
> ++---
>  include/linux/watchdog.h   |  26 +++--
>  3 files changed, 163 insertions(+), 26 deletions(-)
> 
> diff --git a/Documentation/watchdog/watchdog-kernel-api.txt 
> b/Documentation/watchdog/watchdog-kernel-api.txt
> index d8b0d3367706..25b00b878a7b 100644
> --- a/Documentation/watchdog/watchdog-kernel-api.txt
> +++ b/Documentation/watchdog/watchdog-kernel-api.txt
> @@ -53,9 +53,12 @@ struct watchdog_device {
>   unsigned int timeout;
>   unsigned int min_timeout;
>   unsigned int max_timeout;
> + unsigned int max_hw_timeout_ms;
>   void *driver_data;
> - struct mutex lock;
>   unsigned long status;
> + struct mutex lock;
> + unsigned long last_keepalive;
> + struct delayed_work work;
>   struct list_head deferred;
>  };
>  
> @@ -73,18 +76,28 @@ It contains following fields:
>additional information about the watchdog timer itself. (Like it's unique 
> name)
>  * ops: a pointer to the list of watchdog operations that the watchdog 
> supports.
>  * timeout: the watchdog timer's timeout value (in seconds).
> +  This is the time after which the system will reboot if user space does
> +  not send a heartbeat request if WDOG_ACTIVE is set.
>  * min_timeout: the watchdog timer's minimum timeout value (in seconds).
>  * max_timeout: the watchdog timer's maximum timeout value (in seconds).
> +* max_hw_timeout_ms: Maximum hardware timeout, in milli-seconds. May differ
> +  from max_timeout. If set to a value larger than max_timeout, the
> +  infrastructure will send a heartbeat to the watchdog driver if 'timeout'
> +  is larger than 'max_hw_timeout / 2', unless WDOG_ACTIVE is set and user
> +  space failed to send a heartbeat for at least 'timeout' seconds.
To properly understand this it would be necessary to know what
max_timeout is. Maybe clearify that, too?

>  * bootstatus: status of the device after booting (reported with watchdog
>WDIOF_* status bits).
>  * driver_data: a pointer to the drivers private data of a watchdog device.
>This data should only be accessed via the watchdog_set_drvdata and
>watchdog_get_drvdata routines.
> -* lock: Mutex for WatchDog Timer Driver Core internal use only.
>  * status: this field contains a number of status bits that give extra
>information about the status of the device (Like: is the watchdog timer
>running/active, is the nowayout bit set, is the device opened via
>the /dev/watchdog interface or not, ...).
> +* lock: Mutex for WatchDog Timer Driver Core internal use only.
> +* last_keepalive: Time of most recent keepalive triggered from user space,
> +  in jiffies.
> +* work: Worker data structure for WatchDog Timer Driver Core internal use 
> only.
>  * deferred: entry in wtd_deferred_reg_list which is used to
>register early initialized watchdogs.
>  
> @@ -160,7 +173,11 @@ they are supported. These optional routines/operations 
> are:
>and -EIO for "could not write value to the watchdog". On success this
>routine should set the timeout value of the watchdog_device to the
>achieved timeout value (which may be different from the requested one
> -  because the watchdog does not necessarily has a 1 second resolution).
> +  because the watchdog does not necessarily have a 1 second resolution).
> +  Drivers implementing hw_max_timeout_ms set the hardware watchdog timeout
> +  to the minimum of timeout and hw_max_timeout_ms. Those drivers set the
> +  timeout value of the watchdog_device either to the requested timeout value
> +  (if it is larger than 

Re: [PATCH v2 2/8] watchdog: Introduce hardware maximum timeout in watchdog core

2015-08-13 Thread Uwe Kleine-König
Hello Guenter,

On Fri, Aug 07, 2015 at 10:02:41PM -0700, Guenter Roeck wrote:
 Introduce an optional hardware maximum timeout in the watchdog core.
 The hardware maximum timeout can be lower than the maximum timeout.
 
 Drivers can set the maximum hardware timeout value in the watchdog data
 structure. If the configured timeout exceeds the maximum hardware timeout,
 the watchdog core enables a timer function to assist sending keepalive
 requests to the watchdog driver.
 
 Cc: Timo Kokkonen timo.kokko...@offcode.fi
 Cc: Uwe Kleine-König u.kleine-koe...@pengutronix.de
 Signed-off-by: Guenter Roeck li...@roeck-us.net
 ---
 v2:
 - Improved and hopefully clarified documentation.
 - Rearranged variables in struct watchdog_device such that internal variables
   come last.
 - The code now ensures that the watchdog times out timeout seconds after
   the most recent keepalive sent from user space.
 - The internal keepalive now stops silently and no longer generates a
   warning message. Reason is that it will now stop early, while there
   may still be a substantial amount of time for keepalives from user space
   to arrive. If such keepalives arrive late (for example if user space
   is configured to send keepalives just a few seconds before the watchdog
   times out), the message would just be noise and not provide any value.
 ---
  Documentation/watchdog/watchdog-kernel-api.txt |  23 +++-
  drivers/watchdog/watchdog_dev.c| 140 
 ++---
  include/linux/watchdog.h   |  26 +++--
  3 files changed, 163 insertions(+), 26 deletions(-)
 
 diff --git a/Documentation/watchdog/watchdog-kernel-api.txt 
 b/Documentation/watchdog/watchdog-kernel-api.txt
 index d8b0d3367706..25b00b878a7b 100644
 --- a/Documentation/watchdog/watchdog-kernel-api.txt
 +++ b/Documentation/watchdog/watchdog-kernel-api.txt
 @@ -53,9 +53,12 @@ struct watchdog_device {
   unsigned int timeout;
   unsigned int min_timeout;
   unsigned int max_timeout;
 + unsigned int max_hw_timeout_ms;
   void *driver_data;
 - struct mutex lock;
   unsigned long status;
 + struct mutex lock;
 + unsigned long last_keepalive;
 + struct delayed_work work;
   struct list_head deferred;
  };
  
 @@ -73,18 +76,28 @@ It contains following fields:
additional information about the watchdog timer itself. (Like it's unique 
 name)
  * ops: a pointer to the list of watchdog operations that the watchdog 
 supports.
  * timeout: the watchdog timer's timeout value (in seconds).
 +  This is the time after which the system will reboot if user space does
 +  not send a heartbeat request if WDOG_ACTIVE is set.
  * min_timeout: the watchdog timer's minimum timeout value (in seconds).
  * max_timeout: the watchdog timer's maximum timeout value (in seconds).
 +* max_hw_timeout_ms: Maximum hardware timeout, in milli-seconds. May differ
 +  from max_timeout. If set to a value larger than max_timeout, the
 +  infrastructure will send a heartbeat to the watchdog driver if 'timeout'
 +  is larger than 'max_hw_timeout / 2', unless WDOG_ACTIVE is set and user
 +  space failed to send a heartbeat for at least 'timeout' seconds.
To properly understand this it would be necessary to know what
max_timeout is. Maybe clearify that, too?

  * bootstatus: status of the device after booting (reported with watchdog
WDIOF_* status bits).
  * driver_data: a pointer to the drivers private data of a watchdog device.
This data should only be accessed via the watchdog_set_drvdata and
watchdog_get_drvdata routines.
 -* lock: Mutex for WatchDog Timer Driver Core internal use only.
  * status: this field contains a number of status bits that give extra
information about the status of the device (Like: is the watchdog timer
running/active, is the nowayout bit set, is the device opened via
the /dev/watchdog interface or not, ...).
 +* lock: Mutex for WatchDog Timer Driver Core internal use only.
 +* last_keepalive: Time of most recent keepalive triggered from user space,
 +  in jiffies.
 +* work: Worker data structure for WatchDog Timer Driver Core internal use 
 only.
  * deferred: entry in wtd_deferred_reg_list which is used to
register early initialized watchdogs.
  
 @@ -160,7 +173,11 @@ they are supported. These optional routines/operations 
 are:
and -EIO for could not write value to the watchdog. On success this
routine should set the timeout value of the watchdog_device to the
achieved timeout value (which may be different from the requested one
 -  because the watchdog does not necessarily has a 1 second resolution).
 +  because the watchdog does not necessarily have a 1 second resolution).
 +  Drivers implementing hw_max_timeout_ms set the hardware watchdog timeout
 +  to the minimum of timeout and hw_max_timeout_ms. Those drivers set the
 +  timeout value of the watchdog_device either to the requested timeout value
 +  (if it is larger than hw_max_timeout_ms), or to 

[PATCH v2 2/8] watchdog: Introduce hardware maximum timeout in watchdog core

2015-08-07 Thread Guenter Roeck
Introduce an optional hardware maximum timeout in the watchdog core.
The hardware maximum timeout can be lower than the maximum timeout.

Drivers can set the maximum hardware timeout value in the watchdog data
structure. If the configured timeout exceeds the maximum hardware timeout,
the watchdog core enables a timer function to assist sending keepalive
requests to the watchdog driver.

Cc: Timo Kokkonen 
Cc: Uwe Kleine-König 
Signed-off-by: Guenter Roeck 
---
v2:
- Improved and hopefully clarified documentation.
- Rearranged variables in struct watchdog_device such that internal variables
  come last.
- The code now ensures that the watchdog times out  seconds after
  the most recent keepalive sent from user space.
- The internal keepalive now stops silently and no longer generates a
  warning message. Reason is that it will now stop early, while there
  may still be a substantial amount of time for keepalives from user space
  to arrive. If such keepalives arrive late (for example if user space
  is configured to send keepalives just a few seconds before the watchdog
  times out), the message would just be noise and not provide any value.
---
 Documentation/watchdog/watchdog-kernel-api.txt |  23 +++-
 drivers/watchdog/watchdog_dev.c| 140 ++---
 include/linux/watchdog.h   |  26 +++--
 3 files changed, 163 insertions(+), 26 deletions(-)

diff --git a/Documentation/watchdog/watchdog-kernel-api.txt 
b/Documentation/watchdog/watchdog-kernel-api.txt
index d8b0d3367706..25b00b878a7b 100644
--- a/Documentation/watchdog/watchdog-kernel-api.txt
+++ b/Documentation/watchdog/watchdog-kernel-api.txt
@@ -53,9 +53,12 @@ struct watchdog_device {
unsigned int timeout;
unsigned int min_timeout;
unsigned int max_timeout;
+   unsigned int max_hw_timeout_ms;
void *driver_data;
-   struct mutex lock;
unsigned long status;
+   struct mutex lock;
+   unsigned long last_keepalive;
+   struct delayed_work work;
struct list_head deferred;
 };
 
@@ -73,18 +76,28 @@ It contains following fields:
   additional information about the watchdog timer itself. (Like it's unique 
name)
 * ops: a pointer to the list of watchdog operations that the watchdog supports.
 * timeout: the watchdog timer's timeout value (in seconds).
+  This is the time after which the system will reboot if user space does
+  not send a heartbeat request if WDOG_ACTIVE is set.
 * min_timeout: the watchdog timer's minimum timeout value (in seconds).
 * max_timeout: the watchdog timer's maximum timeout value (in seconds).
+* max_hw_timeout_ms: Maximum hardware timeout, in milli-seconds. May differ
+  from max_timeout. If set to a value larger than max_timeout, the
+  infrastructure will send a heartbeat to the watchdog driver if 'timeout'
+  is larger than 'max_hw_timeout / 2', unless WDOG_ACTIVE is set and user
+  space failed to send a heartbeat for at least 'timeout' seconds.
 * bootstatus: status of the device after booting (reported with watchdog
   WDIOF_* status bits).
 * driver_data: a pointer to the drivers private data of a watchdog device.
   This data should only be accessed via the watchdog_set_drvdata and
   watchdog_get_drvdata routines.
-* lock: Mutex for WatchDog Timer Driver Core internal use only.
 * status: this field contains a number of status bits that give extra
   information about the status of the device (Like: is the watchdog timer
   running/active, is the nowayout bit set, is the device opened via
   the /dev/watchdog interface or not, ...).
+* lock: Mutex for WatchDog Timer Driver Core internal use only.
+* last_keepalive: Time of most recent keepalive triggered from user space,
+  in jiffies.
+* work: Worker data structure for WatchDog Timer Driver Core internal use only.
 * deferred: entry in wtd_deferred_reg_list which is used to
   register early initialized watchdogs.
 
@@ -160,7 +173,11 @@ they are supported. These optional routines/operations are:
   and -EIO for "could not write value to the watchdog". On success this
   routine should set the timeout value of the watchdog_device to the
   achieved timeout value (which may be different from the requested one
-  because the watchdog does not necessarily has a 1 second resolution).
+  because the watchdog does not necessarily have a 1 second resolution).
+  Drivers implementing hw_max_timeout_ms set the hardware watchdog timeout
+  to the minimum of timeout and hw_max_timeout_ms. Those drivers set the
+  timeout value of the watchdog_device either to the requested timeout value
+  (if it is larger than hw_max_timeout_ms), or to the achieved timeout value.
   (Note: the WDIOF_SETTIMEOUT needs to be set in the options field of the
   watchdog's info structure).
 * get_timeleft: this routines returns the time that's left before a reset.
diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index 06171c73daf5..c04ba1a98cc8 100644
--- 

[PATCH v2 2/8] watchdog: Introduce hardware maximum timeout in watchdog core

2015-08-07 Thread Guenter Roeck
Introduce an optional hardware maximum timeout in the watchdog core.
The hardware maximum timeout can be lower than the maximum timeout.

Drivers can set the maximum hardware timeout value in the watchdog data
structure. If the configured timeout exceeds the maximum hardware timeout,
the watchdog core enables a timer function to assist sending keepalive
requests to the watchdog driver.

Cc: Timo Kokkonen timo.kokko...@offcode.fi
Cc: Uwe Kleine-König u.kleine-koe...@pengutronix.de
Signed-off-by: Guenter Roeck li...@roeck-us.net
---
v2:
- Improved and hopefully clarified documentation.
- Rearranged variables in struct watchdog_device such that internal variables
  come last.
- The code now ensures that the watchdog times out timeout seconds after
  the most recent keepalive sent from user space.
- The internal keepalive now stops silently and no longer generates a
  warning message. Reason is that it will now stop early, while there
  may still be a substantial amount of time for keepalives from user space
  to arrive. If such keepalives arrive late (for example if user space
  is configured to send keepalives just a few seconds before the watchdog
  times out), the message would just be noise and not provide any value.
---
 Documentation/watchdog/watchdog-kernel-api.txt |  23 +++-
 drivers/watchdog/watchdog_dev.c| 140 ++---
 include/linux/watchdog.h   |  26 +++--
 3 files changed, 163 insertions(+), 26 deletions(-)

diff --git a/Documentation/watchdog/watchdog-kernel-api.txt 
b/Documentation/watchdog/watchdog-kernel-api.txt
index d8b0d3367706..25b00b878a7b 100644
--- a/Documentation/watchdog/watchdog-kernel-api.txt
+++ b/Documentation/watchdog/watchdog-kernel-api.txt
@@ -53,9 +53,12 @@ struct watchdog_device {
unsigned int timeout;
unsigned int min_timeout;
unsigned int max_timeout;
+   unsigned int max_hw_timeout_ms;
void *driver_data;
-   struct mutex lock;
unsigned long status;
+   struct mutex lock;
+   unsigned long last_keepalive;
+   struct delayed_work work;
struct list_head deferred;
 };
 
@@ -73,18 +76,28 @@ It contains following fields:
   additional information about the watchdog timer itself. (Like it's unique 
name)
 * ops: a pointer to the list of watchdog operations that the watchdog supports.
 * timeout: the watchdog timer's timeout value (in seconds).
+  This is the time after which the system will reboot if user space does
+  not send a heartbeat request if WDOG_ACTIVE is set.
 * min_timeout: the watchdog timer's minimum timeout value (in seconds).
 * max_timeout: the watchdog timer's maximum timeout value (in seconds).
+* max_hw_timeout_ms: Maximum hardware timeout, in milli-seconds. May differ
+  from max_timeout. If set to a value larger than max_timeout, the
+  infrastructure will send a heartbeat to the watchdog driver if 'timeout'
+  is larger than 'max_hw_timeout / 2', unless WDOG_ACTIVE is set and user
+  space failed to send a heartbeat for at least 'timeout' seconds.
 * bootstatus: status of the device after booting (reported with watchdog
   WDIOF_* status bits).
 * driver_data: a pointer to the drivers private data of a watchdog device.
   This data should only be accessed via the watchdog_set_drvdata and
   watchdog_get_drvdata routines.
-* lock: Mutex for WatchDog Timer Driver Core internal use only.
 * status: this field contains a number of status bits that give extra
   information about the status of the device (Like: is the watchdog timer
   running/active, is the nowayout bit set, is the device opened via
   the /dev/watchdog interface or not, ...).
+* lock: Mutex for WatchDog Timer Driver Core internal use only.
+* last_keepalive: Time of most recent keepalive triggered from user space,
+  in jiffies.
+* work: Worker data structure for WatchDog Timer Driver Core internal use only.
 * deferred: entry in wtd_deferred_reg_list which is used to
   register early initialized watchdogs.
 
@@ -160,7 +173,11 @@ they are supported. These optional routines/operations are:
   and -EIO for could not write value to the watchdog. On success this
   routine should set the timeout value of the watchdog_device to the
   achieved timeout value (which may be different from the requested one
-  because the watchdog does not necessarily has a 1 second resolution).
+  because the watchdog does not necessarily have a 1 second resolution).
+  Drivers implementing hw_max_timeout_ms set the hardware watchdog timeout
+  to the minimum of timeout and hw_max_timeout_ms. Those drivers set the
+  timeout value of the watchdog_device either to the requested timeout value
+  (if it is larger than hw_max_timeout_ms), or to the achieved timeout value.
   (Note: the WDIOF_SETTIMEOUT needs to be set in the options field of the
   watchdog's info structure).
 * get_timeleft: this routines returns the time that's left before a reset.
diff --git a/drivers/watchdog/watchdog_dev.c