Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

renayama19661014 Fri, 14 Oct 2016 02:25:29 -0700

Hi Klaus,
Hi All,

I tried prototype of watchdog using WD service.
 - 
https://github.com/HideoYamauchi/pacemaker/commit/3ee97b76e0212b1790226864dfcacd1a327dbcc9


Please comment.


Best Regards,
Hideo Yamauchi.


----- Original Message -----
> From: "[email protected]" <[email protected]>
> To: "[email protected]" <[email protected]>
> Cc: 
> Date: 2016/10/11, Tue 17:58
> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is 
> frozen, cluster decisions are delayed infinitely
> 
> Hi Klaus,
> 
> Thank you for comment.
> 
> I make the patch which is prototype using WD service.
> 
> Please wait a little.
> 
> Best Regards,
> Hideo Yamauchi.
> 
> 
> 
> 
> ----- Original Message -----
>>  From: Klaus Wenninger <[email protected]>
>>  To: [email protected]
>>  Cc: 
>>  Date: 2016/10/10, Mon 21:03
>>  Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd 
> is frozen, cluster decisions are delayed infinitely
>> 
>>  On 10/07/2016 11:10 PM, [email protected] wrote:
>>>   Hi All,
>>> 
>>>   Our user may not necessarily use sdb.
>>> 
>>>   I confirmed that there was a method using WD service of corosync as 
> one 
>>  method not to use sdb.
>>>   Pacemaker watches the process of pacemaker by WD service using CMAP 
> and can 
>>  carry out watchdog.
>> 
>>  Have to have a look at that...
>>  But if we establish some in-between-layer in pacemaker we could have this
>>  as one of the possibilities besides e.g. sbd (with enhanced API), going for
>>  a watchdog-device directly, ...
>> 
>>> 
>>> 
>>>   We can set up a patch of pacemaker.
>> 
>>  Always helpful to discuss/clarify an idea once some code is available ...
>> 
>>>   Was the discussion of using WD service over so far?
>> 
>>  Not from my pov. Just a day off ;-)
>> 
>>> 
>>> 
>>>   Best Regard,
>>>   Hideo Yamauchi.
>>> 
>>> 
>>>   ----- Original Message -----
>>>>   From: Klaus Wenninger <[email protected]>
>>>>   To: Ulrich Windl <[email protected]>; 
>>  [email protected]
>>>>   Cc: 
>>>>   Date: 2016/10/7, Fri 17:47
>>>>   Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the 
> DC 
>>  crmd is frozen, cluster decisions are delayed infinitely
>>>> 
>>>>   On 10/07/2016 08:14 AM, Ulrich Windl wrote:
>>>>>>>>    Klaus Wenninger <[email protected]> 
> schrieb am 
>> 
>>>>   06.10.2016 um 18:03 in
>>>>>    Nachricht 
> <[email protected]>:
>>>>>>    On 10/05/2016 04:22 PM, [email protected] wrote:
>>>>>>>    Hi All,
>>>>>>> 
>>>>>>>>>    If a user uses sbd, can the cluster evade a 
>>  problem of 
>>>>   SIGSTOP of crmd?
>>>>>>>>    
>>>>>>>>    As pointed out earlier, maybe crmd should feed a 
>>  watchdog. Then 
>>>>   stopping 
>>>>>>    crmd 
>>>>>>>>    will reboot the node (unless the watchdog fails).
>>>>>>>    Thank you for comment.
>>>>>>> 
>>>>>>>    We examine watchdog of crmd, too.
>>>>>>>    In addition, I comment after examination advanced.
>>>>>>    Was thinking of doing a small test implementation going
>>>>>>    a little in the direction Lars Ellenberg had been 
> pointing 
>>  out.
>>>>>> 
>>>>>>    a couple of thoughts I had so far:
>>>>>> 
>>>>>>    - add an API (via DBus or libqb - favoring libqb atm) to 
> sbd
>>>>>>      an application can use to create a watchdog within sbd
>>>>>    Why has it to be done within sbd?
>>>>   Not necessarily, could be spawned out as well into an own project 
> or
>>>>   something already existent could be taken.
>>>>   Remember to have added a dbus-interface to
>>>>   https://sourceforge.net/projects/watchdog/ for a project once.
>>>>   If you have a suggestion I'm open.
>>>>   Going off sbd would have the advantage of a smooth start:
>>>> 
>>>>   - cluster/pacemaker-watcher are there already and can
>>>>     be replaced/moved over time
>>>>   - the lifecycle of the daemon (when started/stopped) is
>>>>     already something that is in the code and in the people's 
> minds
>>>> 
>>>>>>    - parameters for the first are a name and a timeout
>>>>>> 
>>>>>>    - first use-case would be crmd observation
>>>>>> 
>>>>>>    - later on we could think of removing pacemaker 
> dependencies
>>>>>>      from sbd by moving the actual implementation of
>>>>>>      pacemaker-watcher and probably cluster-watcher as well
>>>>>>      into pacemaker - using the new API
>>>>>> 
>>>>>>    - this of course creates sbd dependency within pacemaker 
> so
>>>>>>      that it would make sense to offer a simpler and 
>>  self-contained
>>>>>>      implementation within pacemaker as an alternative
>>>>>    I think the watchdog interface is so simple that you 
> don't 
>>  need a relay 
>>>>   for it. The only limit I can imagine is the number of watchdogs 
>>  available of 
>>>>   some specific hardware.
>>>>   That is the point ;-)
>>>>>>      thus it would be favorable to have the dependency
>>>>>>      within a non-compulsory pacemaker-rpm so that
>>>>>>      we can offer an alternative that doesn't use sbd
>>>>>>      at maybe the cost of being less reliable or one
>>>>>>      that owns a hardware-watchdog by itself for systems
>>>>>>      where this is still unused.
>>>>>> 
>>>>>>      - e.g. via some kind of plugin (Andrew forgive me -
>>>>>>                                                       no 
> pils ;-) 
>>  )
>>>>>>      - or via an additional daemon
>>>>>> 
>>>>>>    What did you have in mind?
>>>>>>    Maybe it makes sense to synchronize...
>>>>>> 
>>>>>>    Regards,
>>>>>>    Klaus
>>>>>>    
>>>>>>>    Best Regards,
>>>>>>>    Hideo Yamauchi.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>    ----- Original Message -----
>>>>>>>>    From: Ulrich Windl 
>>  <[email protected]>
>>>>>>>>    To: [email protected]; 
> [email protected] 
>>>>>>>>    Cc: 
>>>>>>>>    Date: 2016/10/5, Wed 23:08
>>>>>>>>    Subject: Antw: Re: [ClusterLabs] Antw: Re: When 
> the DC 
>>  crmd is 
>>>>   frozen, 
>>>>>>    cluster decisions are delayed infinitely
>>>>>>>>>>>     <[email protected]> 
>>  schrieb am 
>>>>   21.09.2016 um 11:52 
>>>>>>>>    in Nachricht
>>>>>>>>    
>>  <[email protected]>:
>>>>>>>>>     Hi All,
>>>>>>>>> 
>>>>>>>>>     Was the final conclusion given about this 
>>  problem?
>>>>>>>>> 
>>>>>>>>>     If a user uses sbd, can the cluster evade a 
>>  problem of 
>>>>   SIGSTOP of crmd?
>>>>>>>>    As pointed out earlier, maybe crmd should feed a 
>>  watchdog. Then 
>>>>   stopping 
>>>>>>    crmd 
>>>>>>>>    will reboot the node (unless the watchdog fails).
>>>>>>>> 
>>>>>>>>>     We are interested in this problem, too.
>>>>>>>>> 
>>>>>>>>>     Best Regards,
>>>>>>>>> 
>>>>>>>>>     Hideo Yamauchi.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>     
> _______________________________________________
>>>>>>>>>     Users mailing list: [email protected] 
>>>>>>>>>    http://clusterlabs.org/mailman/listinfo/users 
> 
>>>>>>>>> 
>>>>>>>>>     Project Home: http://www.clusterlabs.org 
>>>>>>>>>     Getting started: 
>>>>   http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>>>>>>     Bugs: http://bugs.clusterlabs.org 
>>>>>>>    _______________________________________________
>>>>>>>    Users mailing list: [email protected] 
>>>>>>>    http://clusterlabs.org/mailman/listinfo/users 
>>>>>>> 
>>>>>>>    Project Home: http://www.clusterlabs.org 
>>>>>>>    Getting started: 
>>>>   http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>>>>    Bugs: http://bugs.clusterlabs.org 
>>>>>> 
>>>>>>    _______________________________________________
>>>>>>    Users mailing list: [email protected] 
>>>>>>    http://clusterlabs.org/mailman/listinfo/users 
>>>>>> 
>>>>>>    Project Home: http://www.clusterlabs.org 
>>>>>>    Getting started: 
>>>>   http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>>>    Bugs: http://bugs.clusterlabs.org 
>>>>> 
>>>> 
>>>>   _______________________________________________
>>>>   Users mailing list: [email protected]
>>>>   http://clusterlabs.org/mailman/listinfo/users
>>>> 
>>>>   Project Home: http://www.clusterlabs.org
>>>>   Getting started: 
>>  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>   Bugs: http://bugs.clusterlabs.org
>>>> 
>>>   _______________________________________________
>>>   Users mailing list: [email protected]
>>>   http://clusterlabs.org/mailman/listinfo/users
>>> 
>>>   Project Home: http://www.clusterlabs.org
>>>   Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>   Bugs: http://bugs.clusterlabs.org
>> 
>> 
>> 
>>  _______________________________________________
>>  Users mailing list: [email protected]
>>  http://clusterlabs.org/mailman/listinfo/users
>> 
>>  Project Home: http://www.clusterlabs.org
>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>  Bugs: http://bugs.clusterlabs.org
>> 
> 
> _______________________________________________
> Users mailing list: [email protected]
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

Reply via email to