Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

2016-11-05 Thread renayama19661014
Hi Klaus,
Hi Jan,
Hi All,

About watchdog using WD service, there does not seem to be the opposite opinion.
I do work to make an official patch from next week.

Best Regards,
Hideo Yamauchi.


- Original Message -
> From: "renayama19661...@ybb.ne.jp" <renayama19661...@ybb.ne.jp>
> To: Cluster Labs - All topics related to open-source clustering welcomed 
> <users@clusterlabs.org>
> Cc: 
> Date: 2016/10/26, Wed 17:46
> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is 
> frozen, cluster decisions are delayed infinitely
> 
> Hi Klaus,
> Hi Jan,
> Hi All,
> 
> Our member argued about watchdog using WD service.
> 
> 1) The WD service is not abolished.
> 2) In pacemaker_remote, it is available by starting corosync in localhost.
> 3) It is necessary for the scramble of watchdog to consider it.
> 4) Because I think about the case which does not use sbd, I do not think 
> about 
> adding an interface similar to corosync-API to sbd for the moment.
> 
> The user chooses a method using method and WD service using sbd and will use 
> it.
> It may cause confusion that there are two methods, but there is value for the 
> user who does not use sbd.
> 
> We want to include watchdog using WD service in Pacemaker.
> I intend to make an official patch.
> 
> What do you think?
> 
> Best Regards,
> Hideo Yamauchi.
> 
> 
> 
> - Original Message -
>>  From: "renayama19661...@ybb.ne.jp" 
> <renayama19661...@ybb.ne.jp>
>>  To: Cluster Labs - All topics related to open-source clustering welcomed 
> <users@clusterlabs.org>
>>  Cc: 
>>  Date: 2016/10/20, Thu 19:08
>>  Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd 
> is frozen, cluster decisions are delayed infinitely
>> 
>>  Hi Klaus,
>>  Hi Jan,
>> 
>>  Thank you for comment.
>> 
>>  I wait for other comment a little more.
>>  We will argue about this matter next week.
>> 
>>  Best Regards,
>>  Hideo Yamauchi.
>> 
>> 
>>  - Original Message -----
>>>   From: Jan Friesse <jfrie...@redhat.com>
>>>   To: kwenn...@redhat.com; Cluster Labs - All topics related to 
> open-source 
>>  clustering welcomed <users@clusterlabs.org>
>>>   Cc: 
>>>   Date: 2016/10/20, Thu 15:46
>>>   Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC 
> crmd 
>>  is frozen, cluster decisions are delayed infinitely
>>> 
>>>> 
>>>>    On 10/14/2016 11:21 AM, renayama19661...@ybb.ne.jp wrote:
>>>>>    Hi Klaus,
>>>>>    Hi All,
>>>>> 
>>>>>    I tried prototype of watchdog using WD service.
>>>>>      - 
>>> 
>> 
> https://github.com/HideoYamauchi/pacemaker/commit/3ee97b76e0212b1790226864dfcacd1a327dbcc9
>>>>> 
>>>>>    Please comment.
>>>>    Thank you Hideo for providing the prototype.
>>>>    Added the patch to my build and it seems to
>>>>    be working as expected.
>>>> 
>>>>    A few thoughts triggered by this approach:
>>>> 
>>>>    - we have to alert the corosync-people as in
>>>>       a chat with Jan Friesse he pointed me to the
>>>>       fact that for corosync 3.x the wd-service was
>>>>       planned to be removed
>>> 
>>>   Actually I didn't express myself correctly. What I wanted to say 
> was 
>>>   "I'm considering idea of removing it", simply because 
>>  it's 
>>>   disabled in 
>>>   downstream.
>>> 
>>>   BUT keep in mind that removing functionality = ask community to find 
> out 
>>>   if there is not somebody actively using it.
>>> 
>>>   And because there is active users and future use case, removing of wd 
> is 
>>>   not an option.
>>> 
>>> 
>>>> 
>>>>       especially delicate as the binding is very loose
>>>>       so that - as is - it builds against a corosync with
>>>>       disabled wd-service without any complaints...
>>>> 
>>>>    - as of now if you enable wd-service in the
>>>>       corosync-build it is on by default and would
>>>>       be hogging the watchdog presumably
>>>>       (there is obviously a pull request that makes
>>>>       it default to off)
>>>> 
>>>>    - with my thoughts about adding an API to
>>>>       sbd previously in the thread I was trying to
>>>&

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

2016-10-26 Thread renayama19661014
Hi Klaus,
Hi Jan,
Hi All,

Our member argued about watchdog using WD service.

1) The WD service is not abolished.
2) In pacemaker_remote, it is available by starting corosync in localhost.
3) It is necessary for the scramble of watchdog to consider it.
4) Because I think about the case which does not use sbd, I do not think about 
adding an interface similar to corosync-API to sbd for the moment.

The user chooses a method using method and WD service using sbd and will use it.
It may cause confusion that there are two methods, but there is value for the 
user who does not use sbd.

We want to include watchdog using WD service in Pacemaker.
I intend to make an official patch.

What do you think?

Best Regards,
Hideo Yamauchi.



- Original Message -
> From: "renayama19661...@ybb.ne.jp" <renayama19661...@ybb.ne.jp>
> To: Cluster Labs - All topics related to open-source clustering welcomed 
> <users@clusterlabs.org>
> Cc: 
> Date: 2016/10/20, Thu 19:08
> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is 
> frozen, cluster decisions are delayed infinitely
> 
> Hi Klaus,
> Hi Jan,
> 
> Thank you for comment.
> 
> I wait for other comment a little more.
> We will argue about this matter next week.
> 
> Best Regards,
> Hideo Yamauchi.
> 
> 
> - Original Message -
>>  From: Jan Friesse <jfrie...@redhat.com>
>>  To: kwenn...@redhat.com; Cluster Labs - All topics related to open-source 
> clustering welcomed <users@clusterlabs.org>
>>  Cc: 
>>  Date: 2016/10/20, Thu 15:46
>>  Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd 
> is frozen, cluster decisions are delayed infinitely
>> 
>>> 
>>>   On 10/14/2016 11:21 AM, renayama19661...@ybb.ne.jp wrote:
>>>>   Hi Klaus,
>>>>   Hi All,
>>>> 
>>>>   I tried prototype of watchdog using WD service.
>>>>     - 
>> 
> https://github.com/HideoYamauchi/pacemaker/commit/3ee97b76e0212b1790226864dfcacd1a327dbcc9
>>>> 
>>>>   Please comment.
>>>   Thank you Hideo for providing the prototype.
>>>   Added the patch to my build and it seems to
>>>   be working as expected.
>>> 
>>>   A few thoughts triggered by this approach:
>>> 
>>>   - we have to alert the corosync-people as in
>>>      a chat with Jan Friesse he pointed me to the
>>>      fact that for corosync 3.x the wd-service was
>>>      planned to be removed
>> 
>>  Actually I didn't express myself correctly. What I wanted to say was 
>>  "I'm considering idea of removing it", simply because 
> it's 
>>  disabled in 
>>  downstream.
>> 
>>  BUT keep in mind that removing functionality = ask community to find out 
>>  if there is not somebody actively using it.
>> 
>>  And because there is active users and future use case, removing of wd is 
>>  not an option.
>> 
>> 
>>> 
>>>      especially delicate as the binding is very loose
>>>      so that - as is - it builds against a corosync with
>>>      disabled wd-service without any complaints...
>>> 
>>>   - as of now if you enable wd-service in the
>>>      corosync-build it is on by default and would
>>>      be hogging the watchdog presumably
>>>      (there is obviously a pull request that makes
>>>      it default to off)
>>> 
>>>   - with my thoughts about adding an API to
>>>      sbd previously in the thread I was trying to
>>>      target closer observation of pacemaker_remoted
>>>      as well (remote-nodes don't have corosync
>>>      running)
>>> 
>>>      I guess it would be possible to run corosync
>>>      with a static config as single-node cluster
>>>      bound to localhost for that purpose.
>>> 
>>>      I read the thread about corosync-remote and
>>>      that happening might make the special-handling
>>>      for pacemaker-remote obsolete anyway ...
>>> 
>>>   - to enable the approach to live alongside
>>>      sbd it would be possible to make sbd use
>>>      the corosync-API as well for watchdog purposes
>>>      instead of opening the watchdog directly
>>> 
>>>      This shouldn't be a big deal for sbd used to
>>>      observe a pacemaker-node as cluster-watcher
>>>      (the part of sbd that sends cpg-pings to corosync)
>>>      already builds against corosync.
>>>      The blockdevice-part of sbd being b

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

2016-10-20 Thread Jan Friesse


On 10/14/2016 11:21 AM, renayama19661...@ybb.ne.jp wrote:

Hi Klaus,
Hi All,

I tried prototype of watchdog using WD service.
  - 
https://github.com/HideoYamauchi/pacemaker/commit/3ee97b76e0212b1790226864dfcacd1a327dbcc9

Please comment.

Thank you Hideo for providing the prototype.
Added the patch to my build and it seems to
be working as expected.

A few thoughts triggered by this approach:

- we have to alert the corosync-people as in
   a chat with Jan Friesse he pointed me to the
   fact that for corosync 3.x the wd-service was
   planned to be removed


Actually I didn't express myself correctly. What I wanted to say was 
"I'm considering idea of removing it", simply because it's disabled in 
downstream.


BUT keep in mind that removing functionality = ask community to find out 
if there is not somebody actively using it.


And because there is active users and future use case, removing of wd is 
not an option.





   especially delicate as the binding is very loose
   so that - as is - it builds against a corosync with
   disabled wd-service without any complaints...

- as of now if you enable wd-service in the
   corosync-build it is on by default and would
   be hogging the watchdog presumably
   (there is obviously a pull request that makes
   it default to off)

- with my thoughts about adding an API to
   sbd previously in the thread I was trying to
   target closer observation of pacemaker_remoted
   as well (remote-nodes don't have corosync
   running)

   I guess it would be possible to run corosync
   with a static config as single-node cluster
   bound to localhost for that purpose.

   I read the thread about corosync-remote and
   that happening might make the special-handling
   for pacemaker-remote obsolete anyway ...

- to enable the approach to live alongside
   sbd it would be possible to make sbd use
   the corosync-API as well for watchdog purposes
   instead of opening the watchdog directly

   This shouldn't be a big deal for sbd used to
   observe a pacemaker-node as cluster-watcher
   (the part of sbd that sends cpg-pings to corosync)
   already builds against corosync.
   The blockdevice-part of sbd being basically
   generic it might be an issue though.

Regards,
Klaus




Best Regards,
Hideo Yamauchi.


- Original Message -

From: "renayama19661...@ybb.ne.jp" <renayama19661...@ybb.ne.jp>
To: "users@clusterlabs.org" <users@clusterlabs.org>
Cc:
Date: 2016/10/11, Tue 17:58
Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is 
frozen, cluster decisions are delayed infinitely

Hi Klaus,

Thank you for comment.

I make the patch which is prototype using WD service.

Please wait a little.

Best Regards,
Hideo Yamauchi.




- Original Message -

  From: Klaus Wenninger <kwenn...@redhat.com>
  To: users@clusterlabs.org
  Cc:
  Date: 2016/10/10, Mon 21:03
  Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd

is frozen, cluster decisions are delayed infinitely

  On 10/07/2016 11:10 PM, renayama19661...@ybb.ne.jp wrote:

   Hi All,

   Our user may not necessarily use sdb.

   I confirmed that there was a method using WD service of corosync as

one

  method not to use sdb.

   Pacemaker watches the process of pacemaker by WD service using CMAP

and can

  carry out watchdog.

  Have to have a look at that...
  But if we establish some in-between-layer in pacemaker we could have this
  as one of the possibilities besides e.g. sbd (with enhanced API), going for
  a watchdog-device directly, ...



   We can set up a patch of pacemaker.

  Always helpful to discuss/clarify an idea once some code is available ...


   Was the discussion of using WD service over so far?

  Not from my pov. Just a day off ;-)



   Best Regard,
   Hideo Yamauchi.


   - Original Message -

   From: Klaus Wenninger <kwenn...@redhat.com>
   To: Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de>;

  users@clusterlabs.org

   Cc:
   Date: 2016/10/7, Fri 17:47
   Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the

DC

  crmd is frozen, cluster decisions are delayed infinitely

   On 10/07/2016 08:14 AM, Ulrich Windl wrote:

Klaus Wenninger <kwenn...@redhat.com>

schrieb am

   06.10.2016 um 18:03 in

Nachricht

<3980cfdd-ebd9-1597-f6bd-a1ca808f7...@redhat.com>:

On 10/05/2016 04:22 PM, renayama19661...@ybb.ne.jp wrote:

Hi All,


If a user uses sbd, can the cluster evade a

  problem of

   SIGSTOP of crmd?


As pointed out earlier, maybe crmd should feed a

  watchdog. Then

   stopping

crmd

will reboot the node (unless the watchdog fails).

Thank you for comment.

We examine watchdog of crmd, too.
In addition, I comment after examination advanced.

Was thinking of doing a small test implementation going
a little in the direction Lars Ellenberg had been

pointing

  out.

a couple of thoughts

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

2016-10-14 Thread renayama19661014
Hi Klaus,
Hi All,

I tried prototype of watchdog using WD service.
 - 
https://github.com/HideoYamauchi/pacemaker/commit/3ee97b76e0212b1790226864dfcacd1a327dbcc9

Please comment.


Best Regards,
Hideo Yamauchi.


- Original Message -
> From: "renayama19661...@ybb.ne.jp" <renayama19661...@ybb.ne.jp>
> To: "users@clusterlabs.org" <users@clusterlabs.org>
> Cc: 
> Date: 2016/10/11, Tue 17:58
> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is 
> frozen, cluster decisions are delayed infinitely
> 
> Hi Klaus,
> 
> Thank you for comment.
> 
> I make the patch which is prototype using WD service.
> 
> Please wait a little.
> 
> Best Regards,
> Hideo Yamauchi.
> 
> 
> 
> 
> - Original Message -
>>  From: Klaus Wenninger <kwenn...@redhat.com>
>>  To: users@clusterlabs.org
>>  Cc: 
>>  Date: 2016/10/10, Mon 21:03
>>  Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd 
> is frozen, cluster decisions are delayed infinitely
>> 
>>  On 10/07/2016 11:10 PM, renayama19661...@ybb.ne.jp wrote:
>>>   Hi All,
>>> 
>>>   Our user may not necessarily use sdb.
>>> 
>>>   I confirmed that there was a method using WD service of corosync as 
> one 
>>  method not to use sdb.
>>>   Pacemaker watches the process of pacemaker by WD service using CMAP 
> and can 
>>  carry out watchdog.
>> 
>>  Have to have a look at that...
>>  But if we establish some in-between-layer in pacemaker we could have this
>>  as one of the possibilities besides e.g. sbd (with enhanced API), going for
>>  a watchdog-device directly, ...
>> 
>>> 
>>> 
>>>   We can set up a patch of pacemaker.
>> 
>>  Always helpful to discuss/clarify an idea once some code is available ...
>> 
>>>   Was the discussion of using WD service over so far?
>> 
>>  Not from my pov. Just a day off ;-)
>> 
>>> 
>>> 
>>>   Best Regard,
>>>   Hideo Yamauchi.
>>> 
>>> 
>>>   - Original Message -
>>>>   From: Klaus Wenninger <kwenn...@redhat.com>
>>>>   To: Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de>; 
>>  users@clusterlabs.org
>>>>   Cc: 
>>>>   Date: 2016/10/7, Fri 17:47
>>>>   Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the 
> DC 
>>  crmd is frozen, cluster decisions are delayed infinitely
>>>> 
>>>>   On 10/07/2016 08:14 AM, Ulrich Windl wrote:
>>>>>>>>    Klaus Wenninger <kwenn...@redhat.com> 
> schrieb am 
>> 
>>>>   06.10.2016 um 18:03 in
>>>>>    Nachricht 
> <3980cfdd-ebd9-1597-f6bd-a1ca808f7...@redhat.com>:
>>>>>>    On 10/05/2016 04:22 PM, renayama19661...@ybb.ne.jp wrote:
>>>>>>>    Hi All,
>>>>>>> 
>>>>>>>>>    If a user uses sbd, can the cluster evade a 
>>  problem of 
>>>>   SIGSTOP of crmd?
>>>>>>>>    
>>>>>>>>    As pointed out earlier, maybe crmd should feed a 
>>  watchdog. Then 
>>>>   stopping 
>>>>>>    crmd 
>>>>>>>>    will reboot the node (unless the watchdog fails).
>>>>>>>    Thank you for comment.
>>>>>>> 
>>>>>>>    We examine watchdog of crmd, too.
>>>>>>>    In addition, I comment after examination advanced.
>>>>>>    Was thinking of doing a small test implementation going
>>>>>>    a little in the direction Lars Ellenberg had been 
> pointing 
>>  out.
>>>>>> 
>>>>>>    a couple of thoughts I had so far:
>>>>>> 
>>>>>>    - add an API (via DBus or libqb - favoring libqb atm) to 
> sbd
>>>>>>      an application can use to create a watchdog within sbd
>>>>>    Why has it to be done within sbd?
>>>>   Not necessarily, could be spawned out as well into an own project 
> or
>>>>   something already existent could be taken.
>>>>   Remember to have added a dbus-interface to
>>>>   https://sourceforge.net/projects/watchdog/ for a project once.
>>>>   If you have a suggestion I'm open.
>>>>   Going off sbd would have the advantage of a smooth start:
>>>> 
>>>>   - cluster/pacemaker-watcher are there already and can
>>>>     be replaced/moved ov

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

2016-10-11 Thread renayama19661014
Hi Klaus,

Thank you for comment.

I make the patch which is prototype using WD service.

Please wait a little.

Best Regards,
Hideo Yamauchi.




- Original Message -
> From: Klaus Wenninger <kwenn...@redhat.com>
> To: users@clusterlabs.org
> Cc: 
> Date: 2016/10/10, Mon 21:03
> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is 
> frozen, cluster decisions are delayed infinitely
> 
> On 10/07/2016 11:10 PM, renayama19661...@ybb.ne.jp wrote:
>>  Hi All,
>> 
>>  Our user may not necessarily use sdb.
>> 
>>  I confirmed that there was a method using WD service of corosync as one 
> method not to use sdb.
>>  Pacemaker watches the process of pacemaker by WD service using CMAP and can 
> carry out watchdog.
> 
> Have to have a look at that...
> But if we establish some in-between-layer in pacemaker we could have this
> as one of the possibilities besides e.g. sbd (with enhanced API), going for
> a watchdog-device directly, ...
> 
>> 
>> 
>>  We can set up a patch of pacemaker.
> 
> Always helpful to discuss/clarify an idea once some code is available ...
> 
>>  Was the discussion of using WD service over so far?
> 
> Not from my pov. Just a day off ;-)
> 
>> 
>> 
>>  Best Regard,
>>  Hideo Yamauchi.
>> 
>> 
>>  - Original Message -
>>>  From: Klaus Wenninger <kwenn...@redhat.com>
>>>  To: Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de>; 
> users@clusterlabs.org
>>>  Cc: 
>>>  Date: 2016/10/7, Fri 17:47
>>>  Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC 
> crmd is frozen, cluster decisions are delayed infinitely
>>> 
>>>  On 10/07/2016 08:14 AM, Ulrich Windl wrote:
>>>>>>>   Klaus Wenninger <kwenn...@redhat.com> schrieb am 
> 
>>>  06.10.2016 um 18:03 in
>>>>   Nachricht <3980cfdd-ebd9-1597-f6bd-a1ca808f7...@redhat.com>:
>>>>>   On 10/05/2016 04:22 PM, renayama19661...@ybb.ne.jp wrote:
>>>>>>   Hi All,
>>>>>> 
>>>>>>>>   If a user uses sbd, can the cluster evade a 
> problem of 
>>>  SIGSTOP of crmd?
>>>>>>>   
>>>>>>>   As pointed out earlier, maybe crmd should feed a 
> watchdog. Then 
>>>  stopping 
>>>>>   crmd 
>>>>>>>   will reboot the node (unless the watchdog fails).
>>>>>>   Thank you for comment.
>>>>>> 
>>>>>>   We examine watchdog of crmd, too.
>>>>>>   In addition, I comment after examination advanced.
>>>>>   Was thinking of doing a small test implementation going
>>>>>   a little in the direction Lars Ellenberg had been pointing 
> out.
>>>>> 
>>>>>   a couple of thoughts I had so far:
>>>>> 
>>>>>   - add an API (via DBus or libqb - favoring libqb atm) to sbd
>>>>>     an application can use to create a watchdog within sbd
>>>>   Why has it to be done within sbd?
>>>  Not necessarily, could be spawned out as well into an own project or
>>>  something already existent could be taken.
>>>  Remember to have added a dbus-interface to
>>>  https://sourceforge.net/projects/watchdog/ for a project once.
>>>  If you have a suggestion I'm open.
>>>  Going off sbd would have the advantage of a smooth start:
>>> 
>>>  - cluster/pacemaker-watcher are there already and can
>>>    be replaced/moved over time
>>>  - the lifecycle of the daemon (when started/stopped) is
>>>    already something that is in the code and in the people's minds
>>> 
>>>>>   - parameters for the first are a name and a timeout
>>>>> 
>>>>>   - first use-case would be crmd observation
>>>>> 
>>>>>   - later on we could think of removing pacemaker dependencies
>>>>>     from sbd by moving the actual implementation of
>>>>>     pacemaker-watcher and probably cluster-watcher as well
>>>>>     into pacemaker - using the new API
>>>>> 
>>>>>   - this of course creates sbd dependency within pacemaker so
>>>>>     that it would make sense to offer a simpler and 
> self-contained
>>>>>     implementation within pacemaker as an alternative
>>>>   I think the watchdog interface is so simple that you don't 
> need a relay 
>>>  for it. The only limit I can imagine is the number of watchdogs 
> availa

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

2016-10-10 Thread Klaus Wenninger
On 10/07/2016 11:10 PM, renayama19661...@ybb.ne.jp wrote:
> Hi All,
>
> Our user may not necessarily use sdb.
>
> I confirmed that there was a method using WD service of corosync as one 
> method not to use sdb.
> Pacemaker watches the process of pacemaker by WD service using CMAP and can 
> carry out watchdog.

Have to have a look at that...
But if we establish some in-between-layer in pacemaker we could have this
as one of the possibilities besides e.g. sbd (with enhanced API), going for
a watchdog-device directly, ...
 
>
>
> We can set up a patch of pacemaker.

Always helpful to discuss/clarify an idea once some code is available ...

> Was the discussion of using WD service over so far?

Not from my pov. Just a day off ;-)

>
>
> Best Regard,
> Hideo Yamauchi.
>
>
> - Original Message -
>> From: Klaus Wenninger <kwenn...@redhat.com>
>> To: Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de>; users@clusterlabs.org
>> Cc: 
>> Date: 2016/10/7, Fri 17:47
>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is 
>> frozen, cluster decisions are delayed infinitely
>>
>> On 10/07/2016 08:14 AM, Ulrich Windl wrote:
>>>>>>  Klaus Wenninger <kwenn...@redhat.com> schrieb am 
>> 06.10.2016 um 18:03 in
>>>  Nachricht <3980cfdd-ebd9-1597-f6bd-a1ca808f7...@redhat.com>:
>>>>  On 10/05/2016 04:22 PM, renayama19661...@ybb.ne.jp wrote:
>>>>>  Hi All,
>>>>>
>>>>>>>  If a user uses sbd, can the cluster evade a problem of 
>> SIGSTOP of crmd?
>>>>>>   
>>>>>>  As pointed out earlier, maybe crmd should feed a watchdog. Then 
>> stopping 
>>>>  crmd 
>>>>>>  will reboot the node (unless the watchdog fails).
>>>>>  Thank you for comment.
>>>>>
>>>>>  We examine watchdog of crmd, too.
>>>>>  In addition, I comment after examination advanced.
>>>>  Was thinking of doing a small test implementation going
>>>>  a little in the direction Lars Ellenberg had been pointing out.
>>>>
>>>>  a couple of thoughts I had so far:
>>>>
>>>>  - add an API (via DBus or libqb - favoring libqb atm) to sbd
>>>>an application can use to create a watchdog within sbd
>>>  Why has it to be done within sbd?
>> Not necessarily, could be spawned out as well into an own project or
>> something already existent could be taken.
>> Remember to have added a dbus-interface to
>> https://sourceforge.net/projects/watchdog/ for a project once.
>> If you have a suggestion I'm open.
>> Going off sbd would have the advantage of a smooth start:
>>
>> - cluster/pacemaker-watcher are there already and can
>>   be replaced/moved over time
>> - the lifecycle of the daemon (when started/stopped) is
>>   already something that is in the code and in the people's minds
>>
>>>>  - parameters for the first are a name and a timeout
>>>>
>>>>  - first use-case would be crmd observation
>>>>
>>>>  - later on we could think of removing pacemaker dependencies
>>>>from sbd by moving the actual implementation of
>>>>pacemaker-watcher and probably cluster-watcher as well
>>>>into pacemaker - using the new API
>>>>
>>>>  - this of course creates sbd dependency within pacemaker so
>>>>that it would make sense to offer a simpler and self-contained
>>>>implementation within pacemaker as an alternative
>>>  I think the watchdog interface is so simple that you don't need a relay 
>> for it. The only limit I can imagine is the number of watchdogs available of 
>> some specific hardware.
>> That is the point ;-)
>>>>thus it would be favorable to have the dependency
>>>>within a non-compulsory pacemaker-rpm so that
>>>>we can offer an alternative that doesn't use sbd
>>>>at maybe the cost of being less reliable or one
>>>>that owns a hardware-watchdog by itself for systems
>>>>where this is still unused.
>>>>
>>>>- e.g. via some kind of plugin (Andrew forgive me -
>>>> no pils ;-) )
>>>>- or via an additional daemon
>>>>
>>>>  What did you have in mind?
>>>>  Maybe it makes sense to synchronize...
>>>>
>>>>  Regards,
>>>>  Klaus
>>>>   
>>>>>  Best