Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

2016-11-05 Thread renayama19661014
.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>       - Original Message -
>>>>>>>>>       From: Klaus Wenninger 
>>  <kwenn...@redhat.com>
>>>>>>>>>       To: Ulrich Windl 
>>>   <ulrich.wi...@rz.uni-regensburg.de>;
>>>>>>>     users@clusterlabs.org
>>>>>>>>>       Cc:
>>>>>>>>>       Date: 2016/10/7, Fri 17:47
>>>>>>>>>       Subject: Re: [ClusterLabs] Antw: Re: Antw: 
> Re: 
>>  Antw: 
>>>   Re: When the
>>>>>>    DC
>>>>>>>      crmd is frozen, cluster decisions are delayed 
> infinitely
>>>>>>>>>       On 10/07/2016 08:14 AM, Ulrich Windl 
> wrote:
>>>>>>>>>>>>>        Klaus Wenninger 
>>>   <kwenn...@redhat.com>
>>>>>>    schrieb am
>>>>>>>>>       06.10.2016 um 18:03 in
>>>>>>>>>>        Nachricht
>>>>>>    <3980cfdd-ebd9-1597-f6bd-a1ca808f7...@redhat.com>:
>>>>>>>>>>>        On 10/05/2016 04:22 PM, 
>>>   renayama19661...@ybb.ne.jp wrote:
>>>>>>>>>>>>        Hi All,
>>>>>>>>>>>> 
>>>>>>>>>>>>>>        If a user uses sbd, 
> can 
>>  the 
>>>   cluster evade a
>>>>>>>      problem of
>>>>>>>>>       SIGSTOP of crmd?
>>>>>>>>>>>>> 
>>>>>>>>>>>>>        As pointed out earlier, 
> maybe 
>>  crmd 
>>>   should feed a
>>>>>>>      watchdog. Then
>>>>>>>>>       stopping
>>>>>>>>>>>        crmd
>>>>>>>>>>>>>        will reboot the node 
> (unless 
>>  the 
>>>   watchdog fails).
>>>>>>>>>>>>        Thank you for comment.
>>>>>>>>>>>> 
>>>>>>>>>>>>        We examine watchdog of crmd, 
> too.
>>>>>>>>>>>>        In addition, I comment after 
>>>   examination advanced.
>>>>>>>>>>>        Was thinking of doing a small 
> test 
>>>   implementation going
>>>>>>>>>>>        a little in the direction Lars 
>>  Ellenberg 
>>>   had been
>>>>>>    pointing
>>>>>>>      out.
>>>>>>>>>>>        a couple of thoughts I had so 
> far:
>>>>>>>>>>> 
>>>>>>>>>>>        - add an API (via DBus or libqb - 
> 
>>  favoring 
>>>   libqb atm) to
>>>>>>    sbd
>>>>>>>>>>>          an application can use to 
> create a 
>>>   watchdog within sbd
>>>>>>>>>>        Why has it to be done within sbd?
>>>>>>>>>       Not necessarily, could be spawned out as 
> well 
>>  into 
>>>   an own project
>>>>>>    or
>>>>>>>>>       something already existent could be taken.
>>>>>>>>>       Remember to have added a dbus-interface to
>>>>>>>>>       https://sourceforge.net/projects/watchdog/ 
> for 
>>  a 
>>>   project once.
>>>>>>>>>       If you have a suggestion I'm open.
>>>>>>>>>       Going off sbd would have the advantage of 
> a 
>>  smooth 
>>>   start:
>>>>>>>>> 
>>>>>>>>>       - cluster/pacemaker-watcher are there 
> already 
>>  and 
>>>   can
>>>>>>>>>         be replaced/moved over time
>>>>>>>>>       - the lifecycle of the daemon (when 
>>  started/stopped) 
>>>   is
>>>>>>>>>         already something that is in the code 
> and in 
>>  the 
>>>   people's
>>>>>>    minds
>>>>>>>>>>>        - parameters for the first are a 
> name 
>>  and a 
>>>   timeout
>>>>>>>>>>> 
>>>>>>>>>>>        - first use-case would be crmd 
>>  observation
>>>>>>&

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

2016-10-26 Thread renayama19661014
ut earlier, maybe 
> crmd 
>>  should feed a
>>>>>>     watchdog. Then
>>>>>>>>      stopping
>>>>>>>>>>       crmd
>>>>>>>>>>>>       will reboot the node (unless 
> the 
>>  watchdog fails).
>>>>>>>>>>>       Thank you for comment.
>>>>>>>>>>> 
>>>>>>>>>>>       We examine watchdog of crmd, too.
>>>>>>>>>>>       In addition, I comment after 
>>  examination advanced.
>>>>>>>>>>       Was thinking of doing a small test 
>>  implementation going
>>>>>>>>>>       a little in the direction Lars 
> Ellenberg 
>>  had been
>>>>>   pointing
>>>>>>     out.
>>>>>>>>>>       a couple of thoughts I had so far:
>>>>>>>>>> 
>>>>>>>>>>       - add an API (via DBus or libqb - 
> favoring 
>>  libqb atm) to
>>>>>   sbd
>>>>>>>>>>         an application can use to create a 
>>  watchdog within sbd
>>>>>>>>>       Why has it to be done within sbd?
>>>>>>>>      Not necessarily, could be spawned out as well 
> into 
>>  an own project
>>>>>   or
>>>>>>>>      something already existent could be taken.
>>>>>>>>      Remember to have added a dbus-interface to
>>>>>>>>      https://sourceforge.net/projects/watchdog/ for 
> a 
>>  project once.
>>>>>>>>      If you have a suggestion I'm open.
>>>>>>>>      Going off sbd would have the advantage of a 
> smooth 
>>  start:
>>>>>>>> 
>>>>>>>>      - cluster/pacemaker-watcher are there already 
> and 
>>  can
>>>>>>>>        be replaced/moved over time
>>>>>>>>      - the lifecycle of the daemon (when 
> started/stopped) 
>>  is
>>>>>>>>        already something that is in the code and in 
> the 
>>  people's
>>>>>   minds
>>>>>>>>>>       - parameters for the first are a name 
> and a 
>>  timeout
>>>>>>>>>> 
>>>>>>>>>>       - first use-case would be crmd 
> observation
>>>>>>>>>> 
>>>>>>>>>>       - later on we could think of removing 
>>  pacemaker
>>>>>   dependencies
>>>>>>>>>>         from sbd by moving the actual 
>>  implementation of
>>>>>>>>>>         pacemaker-watcher and probably 
>>  cluster-watcher as well
>>>>>>>>>>         into pacemaker - using the new API
>>>>>>>>>> 
>>>>>>>>>>       - this of course creates sbd 
> dependency 
>>  within pacemaker
>>>>>   so
>>>>>>>>>>         that it would make sense to offer a 
>>  simpler and
>>>>>>     self-contained
>>>>>>>>>>         implementation within pacemaker as 
> an 
>>  alternative
>>>>>>>>>       I think the watchdog interface is so 
> simple 
>>  that you
>>>>>   don't
>>>>>>     need a relay
>>>>>>>>      for it. The only limit I can imagine is the 
> number 
>>  of watchdogs
>>>>>>     available of
>>>>>>>>      some specific hardware.
>>>>>>>>      That is the point ;-)
>>>>>>>>>>         thus it would be favorable to have 
> the 
>>  dependency
>>>>>>>>>>         within a non-compulsory 
> pacemaker-rpm so 
>>  that
>>>>>>>>>>         we can offer an alternative that 
>>  doesn't use sbd
>>>>>>>>>>         at maybe the cost of being less 
> reliable 
>>  or one
>>>>>>>>>>         that owns a hardware-watchdog by 
> itself 
>>  for systems
>>>>>>>>>>         where this is still unused.
>>>>>>>>>> 
>>>>>>>>>>         - e.g. via some kind of plugin 
> (Andrew 
>>  forgive me -
>>>>>>>>>>           

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

2016-10-20 Thread Jan Friesse
 I had so far:

- add an API (via DBus or libqb - favoring libqb atm) to

sbd

  an application can use to create a watchdog within sbd

Why has it to be done within sbd?

   Not necessarily, could be spawned out as well into an own project

or

   something already existent could be taken.
   Remember to have added a dbus-interface to
   https://sourceforge.net/projects/watchdog/ for a project once.
   If you have a suggestion I'm open.
   Going off sbd would have the advantage of a smooth start:

   - cluster/pacemaker-watcher are there already and can
 be replaced/moved over time
   - the lifecycle of the daemon (when started/stopped) is
 already something that is in the code and in the people's

minds

- parameters for the first are a name and a timeout

- first use-case would be crmd observation

- later on we could think of removing pacemaker

dependencies

  from sbd by moving the actual implementation of
  pacemaker-watcher and probably cluster-watcher as well
  into pacemaker - using the new API

- this of course creates sbd dependency within pacemaker

so

  that it would make sense to offer a simpler and

  self-contained

  implementation within pacemaker as an alternative

I think the watchdog interface is so simple that you

don't

  need a relay

   for it. The only limit I can imagine is the number of watchdogs

  available of

   some specific hardware.
   That is the point ;-)

  thus it would be favorable to have the dependency
  within a non-compulsory pacemaker-rpm so that
  we can offer an alternative that doesn't use sbd
  at maybe the cost of being less reliable or one
  that owns a hardware-watchdog by itself for systems
  where this is still unused.

  - e.g. via some kind of plugin (Andrew forgive me -
   no

pils ;-)

  )

  - or via an additional daemon

What did you have in mind?
Maybe it makes sense to synchronize...

Regards,
Klaus


Best Regards,
Hideo Yamauchi.



- Original Message -

From: Ulrich Windl

  <ulrich.wi...@rz.uni-regensburg.de>

To: users@clusterlabs.org;

renayama19661...@ybb.ne.jp

    Cc:
Date: 2016/10/5, Wed 23:08
    Subject: Antw: Re: [ClusterLabs] Antw: Re: When

the DC

  crmd is

   frozen,

cluster decisions are delayed infinitely

 <renayama19661...@ybb.ne.jp>

  schrieb am

   21.09.2016 um 11:52

in Nachricht


  <876439.61305...@web200311.mail.ssk.yahoo.co.jp>:

 Hi All,

 Was the final conclusion given about this

  problem?

 If a user uses sbd, can the cluster evade a

  problem of

   SIGSTOP of crmd?

As pointed out earlier, maybe crmd should feed a

  watchdog. Then

   stopping

crmd

will reboot the node (unless the watchdog fails).


 We are interested in this problem, too.

 Best Regards,

 Hideo Yamauchi.




___

 Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users
 Project Home: http://www.clusterlabs.org
 Getting started:

   http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

 Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started:

   http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started:

   http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Bugs: http://bugs.clusterlabs.org

   ___
   Users mailing list: Users@clusterlabs.org
   http://clusterlabs.org/mailman/listinfo/users

   Project Home: http://www.clusterlabs.org
   Getting started:

  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

   Bugs: http://bugs.clusterlabs.org


   ___
   Users mailing list: Users@clusterlabs.org
   http://clusterlabs.org/mailman/listinfo/users

   Project Home: http://www.clusterlabs.org
   Getting started:

http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

   Bugs: http://bugs.clusterlabs.org



  ___
  Users mailing list: Users@clusterlabs.org
  http://clusterlabs.org/mailman/listinfo/users

  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

2016-10-14 Thread renayama19661014
er time
>>>>   - the lifecycle of the daemon (when started/stopped) is
>>>>     already something that is in the code and in the people's 
> minds
>>>> 
>>>>>>    - parameters for the first are a name and a timeout
>>>>>> 
>>>>>>    - first use-case would be crmd observation
>>>>>> 
>>>>>>    - later on we could think of removing pacemaker 
> dependencies
>>>>>>      from sbd by moving the actual implementation of
>>>>>>      pacemaker-watcher and probably cluster-watcher as well
>>>>>>      into pacemaker - using the new API
>>>>>> 
>>>>>>    - this of course creates sbd dependency within pacemaker 
> so
>>>>>>      that it would make sense to offer a simpler and 
>>  self-contained
>>>>>>      implementation within pacemaker as an alternative
>>>>>    I think the watchdog interface is so simple that you 
> don't 
>>  need a relay 
>>>>   for it. The only limit I can imagine is the number of watchdogs 
>>  available of 
>>>>   some specific hardware.
>>>>   That is the point ;-)
>>>>>>      thus it would be favorable to have the dependency
>>>>>>      within a non-compulsory pacemaker-rpm so that
>>>>>>      we can offer an alternative that doesn't use sbd
>>>>>>      at maybe the cost of being less reliable or one
>>>>>>      that owns a hardware-watchdog by itself for systems
>>>>>>      where this is still unused.
>>>>>> 
>>>>>>      - e.g. via some kind of plugin (Andrew forgive me -
>>>>>>                                                       no 
> pils ;-) 
>>  )
>>>>>>      - or via an additional daemon
>>>>>> 
>>>>>>    What did you have in mind?
>>>>>>    Maybe it makes sense to synchronize...
>>>>>> 
>>>>>>    Regards,
>>>>>>    Klaus
>>>>>>    
>>>>>>>    Best Regards,
>>>>>>>    Hideo Yamauchi.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>    - Original Message -
>>>>>>>>    From: Ulrich Windl 
>>  <ulrich.wi...@rz.uni-regensburg.de>
>>>>>>>>    To: users@clusterlabs.org; 
> renayama19661...@ybb.ne.jp 
>>>>>>>>    Cc: 
>>>>>>>>    Date: 2016/10/5, Wed 23:08
>>>>>>>>    Subject: Antw: Re: [ClusterLabs] Antw: Re: When 
> the DC 
>>  crmd is 
>>>>   frozen, 
>>>>>>    cluster decisions are delayed infinitely
>>>>>>>>>>>     <renayama19661...@ybb.ne.jp> 
>>  schrieb am 
>>>>   21.09.2016 um 11:52 
>>>>>>>>    in Nachricht
>>>>>>>>    
>>  <876439.61305...@web200311.mail.ssk.yahoo.co.jp>:
>>>>>>>>>     Hi All,
>>>>>>>>> 
>>>>>>>>>     Was the final conclusion given about this 
>>  problem?
>>>>>>>>> 
>>>>>>>>>     If a user uses sbd, can the cluster evade a 
>>  problem of 
>>>>   SIGSTOP of crmd?
>>>>>>>>    As pointed out earlier, maybe crmd should feed a 
>>  watchdog. Then 
>>>>   stopping 
>>>>>>    crmd 
>>>>>>>>    will reboot the node (unless the watchdog fails).
>>>>>>>> 
>>>>>>>>>     We are interested in this problem, too.
>>>>>>>>> 
>>>>>>>>>     Best Regards,
>>>>>>>>> 
>>>>>>>>>     Hideo Yamauchi.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>     
> ___
>>>>>>>>>     Users mailing list: Users@clusterlabs.org 
>>>>>>>>>    http://clusterlabs.org/mailman/listinfo/users 
> 
>>>>>>>>> 
>>>>>>>>>     Project Home: http://www.clusterlabs.org 
>>>>>>>>>     Getting started: 
>>>>   http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>>>>>>     Bugs: http://bugs.clusterlabs.org 
>>>>>>>    ___
>>>>>>>    Users mailing list: Users@clusterlabs.org 
>>>>>>>    http://clusterlabs.org/mailman/listinfo/users 
>>>>>>> 
>>>>>>>    Project Home: http://www.clusterlabs.org 
>>>>>>>    Getting started: 
>>>>   http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>>>>    Bugs: http://bugs.clusterlabs.org 
>>>>>> 
>>>>>>    ___
>>>>>>    Users mailing list: Users@clusterlabs.org 
>>>>>>    http://clusterlabs.org/mailman/listinfo/users 
>>>>>> 
>>>>>>    Project Home: http://www.clusterlabs.org 
>>>>>>    Getting started: 
>>>>   http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>>>    Bugs: http://bugs.clusterlabs.org 
>>>>> 
>>>> 
>>>>   ___
>>>>   Users mailing list: Users@clusterlabs.org
>>>>   http://clusterlabs.org/mailman/listinfo/users
>>>> 
>>>>   Project Home: http://www.clusterlabs.org
>>>>   Getting started: 
>>  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>   Bugs: http://bugs.clusterlabs.org
>>>> 
>>>   ___
>>>   Users mailing list: Users@clusterlabs.org
>>>   http://clusterlabs.org/mailman/listinfo/users
>>> 
>>>   Project Home: http://www.clusterlabs.org
>>>   Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>   Bugs: http://bugs.clusterlabs.org
>> 
>> 
>> 
>>  ___
>>  Users mailing list: Users@clusterlabs.org
>>  http://clusterlabs.org/mailman/listinfo/users
>> 
>>  Project Home: http://www.clusterlabs.org
>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>  Bugs: http://bugs.clusterlabs.org
>> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

2016-10-11 Thread renayama19661014
ble of 
>>>  some specific hardware.
>>>  That is the point ;-)
>>>>>     thus it would be favorable to have the dependency
>>>>>     within a non-compulsory pacemaker-rpm so that
>>>>>     we can offer an alternative that doesn't use sbd
>>>>>     at maybe the cost of being less reliable or one
>>>>>     that owns a hardware-watchdog by itself for systems
>>>>>     where this is still unused.
>>>>> 
>>>>>     - e.g. via some kind of plugin (Andrew forgive me -
>>>>>                                                      no pils ;-) 
> )
>>>>>     - or via an additional daemon
>>>>> 
>>>>>   What did you have in mind?
>>>>>   Maybe it makes sense to synchronize...
>>>>> 
>>>>>   Regards,
>>>>>   Klaus
>>>>>   
>>>>>>   Best Regards,
>>>>>>   Hideo Yamauchi.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>   - Original Message -
>>>>>>>   From: Ulrich Windl 
> <ulrich.wi...@rz.uni-regensburg.de>
>>>>>>>   To: users@clusterlabs.org; renayama19661...@ybb.ne.jp 
>>>>>>>   Cc: 
>>>>>>>   Date: 2016/10/5, Wed 23:08
>>>>>>>   Subject: Antw: Re: [ClusterLabs] Antw: Re: When the DC 
> crmd is 
>>>  frozen, 
>>>>>   cluster decisions are delayed infinitely
>>>>>>>>>>    <renayama19661...@ybb.ne.jp> 
> schrieb am 
>>>  21.09.2016 um 11:52 
>>>>>>>   in Nachricht
>>>>>>>   
> <876439.61305...@web200311.mail.ssk.yahoo.co.jp>:
>>>>>>>>    Hi All,
>>>>>>>> 
>>>>>>>>    Was the final conclusion given about this 
> problem?
>>>>>>>> 
>>>>>>>>    If a user uses sbd, can the cluster evade a 
> problem of 
>>>  SIGSTOP of crmd?
>>>>>>>   As pointed out earlier, maybe crmd should feed a 
> watchdog. Then 
>>>  stopping 
>>>>>   crmd 
>>>>>>>   will reboot the node (unless the watchdog fails).
>>>>>>> 
>>>>>>>>    We are interested in this problem, too.
>>>>>>>> 
>>>>>>>>    Best Regards,
>>>>>>>> 
>>>>>>>>    Hideo Yamauchi.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>    ___
>>>>>>>>    Users mailing list: Users@clusterlabs.org 
>>>>>>>>   http://clusterlabs.org/mailman/listinfo/users 
>>>>>>>> 
>>>>>>>>    Project Home: http://www.clusterlabs.org 
>>>>>>>>    Getting started: 
>>>  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>>>>>    Bugs: http://bugs.clusterlabs.org 
>>>>>>   ___
>>>>>>   Users mailing list: Users@clusterlabs.org 
>>>>>>   http://clusterlabs.org/mailman/listinfo/users 
>>>>>> 
>>>>>>   Project Home: http://www.clusterlabs.org 
>>>>>>   Getting started: 
>>>  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>>>   Bugs: http://bugs.clusterlabs.org 
>>>>> 
>>>>>   ___
>>>>>   Users mailing list: Users@clusterlabs.org 
>>>>>   http://clusterlabs.org/mailman/listinfo/users 
>>>>> 
>>>>>   Project Home: http://www.clusterlabs.org 
>>>>>   Getting started: 
>>>  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>>   Bugs: http://bugs.clusterlabs.org 
>>>> 
>>> 
>>>  ___
>>>  Users mailing list: Users@clusterlabs.org
>>>  http://clusterlabs.org/mailman/listinfo/users
>>> 
>>>  Project Home: http://www.clusterlabs.org
>>>  Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>  Bugs: http://bugs.clusterlabs.org
>>> 
>>  ___
>>  Users mailing list: Users@clusterlabs.org
>>  http://clusterlabs.org/mailman/listinfo/users
>> 
>>  Project Home: http://www.clusterlabs.org
>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>  Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

2016-10-10 Thread Klaus Wenninger
On 10/07/2016 11:10 PM, renayama19661...@ybb.ne.jp wrote:
> Hi All,
>
> Our user may not necessarily use sdb.
>
> I confirmed that there was a method using WD service of corosync as one 
> method not to use sdb.
> Pacemaker watches the process of pacemaker by WD service using CMAP and can 
> carry out watchdog.

Have to have a look at that...
But if we establish some in-between-layer in pacemaker we could have this
as one of the possibilities besides e.g. sbd (with enhanced API), going for
a watchdog-device directly, ...
 
>
>
> We can set up a patch of pacemaker.

Always helpful to discuss/clarify an idea once some code is available ...

> Was the discussion of using WD service over so far?

Not from my pov. Just a day off ;-)

>
>
> Best Regard,
> Hideo Yamauchi.
>
>
> - Original Message -
>> From: Klaus Wenninger <kwenn...@redhat.com>
>> To: Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de>; users@clusterlabs.org
>> Cc: 
>> Date: 2016/10/7, Fri 17:47
>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is 
>> frozen, cluster decisions are delayed infinitely
>>
>> On 10/07/2016 08:14 AM, Ulrich Windl wrote:
>>>>>>  Klaus Wenninger <kwenn...@redhat.com> schrieb am 
>> 06.10.2016 um 18:03 in
>>>  Nachricht <3980cfdd-ebd9-1597-f6bd-a1ca808f7...@redhat.com>:
>>>>  On 10/05/2016 04:22 PM, renayama19661...@ybb.ne.jp wrote:
>>>>>  Hi All,
>>>>>
>>>>>>>  If a user uses sbd, can the cluster evade a problem of 
>> SIGSTOP of crmd?
>>>>>>   
>>>>>>  As pointed out earlier, maybe crmd should feed a watchdog. Then 
>> stopping 
>>>>  crmd 
>>>>>>  will reboot the node (unless the watchdog fails).
>>>>>  Thank you for comment.
>>>>>
>>>>>  We examine watchdog of crmd, too.
>>>>>  In addition, I comment after examination advanced.
>>>>  Was thinking of doing a small test implementation going
>>>>  a little in the direction Lars Ellenberg had been pointing out.
>>>>
>>>>  a couple of thoughts I had so far:
>>>>
>>>>  - add an API (via DBus or libqb - favoring libqb atm) to sbd
>>>>an application can use to create a watchdog within sbd
>>>  Why has it to be done within sbd?
>> Not necessarily, could be spawned out as well into an own project or
>> something already existent could be taken.
>> Remember to have added a dbus-interface to
>> https://sourceforge.net/projects/watchdog/ for a project once.
>> If you have a suggestion I'm open.
>> Going off sbd would have the advantage of a smooth start:
>>
>> - cluster/pacemaker-watcher are there already and can
>>   be replaced/moved over time
>> - the lifecycle of the daemon (when started/stopped) is
>>   already something that is in the code and in the people's minds
>>
>>>>  - parameters for the first are a name and a timeout
>>>>
>>>>  - first use-case would be crmd observation
>>>>
>>>>  - later on we could think of removing pacemaker dependencies
>>>>from sbd by moving the actual implementation of
>>>>pacemaker-watcher and probably cluster-watcher as well
>>>>into pacemaker - using the new API
>>>>
>>>>  - this of course creates sbd dependency within pacemaker so
>>>>that it would make sense to offer a simpler and self-contained
>>>>implementation within pacemaker as an alternative
>>>  I think the watchdog interface is so simple that you don't need a relay 
>> for it. The only limit I can imagine is the number of watchdogs available of 
>> some specific hardware.
>> That is the point ;-)
>>>>thus it would be favorable to have the dependency
>>>>within a non-compulsory pacemaker-rpm so that
>>>>we can offer an alternative that doesn't use sbd
>>>>at maybe the cost of being less reliable or one
>>>>that owns a hardware-watchdog by itself for systems
>>>>where this is still unused.
>>>>
>>>>- e.g. via some kind of plugin (Andrew forgive me -
>>>> no pils ;-) )
>>>>- or via an additional daemon
>>>>
>>>>  What did you have in mind?
>>>>  Maybe it makes sense to synchronize...
>>>>
>>>>  Regards,
>>>>  Klaus
>>>>   
>>>>>  Best 

[ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

2016-10-07 Thread Ulrich Windl
>>> Klaus Wenninger <kwenn...@redhat.com> schrieb am 06.10.2016 um 18:03 in
Nachricht <3980cfdd-ebd9-1597-f6bd-a1ca808f7...@redhat.com>:
> On 10/05/2016 04:22 PM, renayama19661...@ybb.ne.jp wrote:
>> Hi All,
>>
>>>> If a user uses sbd, can the cluster evade a problem of SIGSTOP of crmd?
>>>  
>>> As pointed out earlier, maybe crmd should feed a watchdog. Then stopping 
> crmd 
>>> will reboot the node (unless the watchdog fails).
>>
>> Thank you for comment.
>>
>> We examine watchdog of crmd, too.
>> In addition, I comment after examination advanced.
> 
> Was thinking of doing a small test implementation going
> a little in the direction Lars Ellenberg had been pointing out.
> 
> a couple of thoughts I had so far:
> 
> - add an API (via DBus or libqb - favoring libqb atm) to sbd
>   an application can use to create a watchdog within sbd

Why has it to be done within sbd?

> 
> - parameters for the first are a name and a timeout
> 
> - first use-case would be crmd observation
> 
> - later on we could think of removing pacemaker dependencies
>   from sbd by moving the actual implementation of
>   pacemaker-watcher and probably cluster-watcher as well
>   into pacemaker - using the new API
> 
> - this of course creates sbd dependency within pacemaker so
>   that it would make sense to offer a simpler and self-contained
>   implementation within pacemaker as an alternative

I think the watchdog interface is so simple that you don't need a relay for it. 
The only limit I can imagine is the number of watchdogs available of some 
specific hardware.

> 
>   thus it would be favorable to have the dependency
>   within a non-compulsory pacemaker-rpm so that
>   we can offer an alternative that doesn't use sbd
>   at maybe the cost of being less reliable or one
>   that owns a hardware-watchdog by itself for systems
>   where this is still unused.
> 
>   - e.g. via some kind of plugin (Andrew forgive me -
>no pils ;-) )
>   - or via an additional daemon
> 
> What did you have in mind?
> Maybe it makes sense to synchronize...
> 
> Regards,
> Klaus
>  
>>
>>
>> Best Regards,
>> Hideo Yamauchi.
>>
>>
>>
>> - Original Message -
>>> From: Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de>
>>> To: users@clusterlabs.org; renayama19661...@ybb.ne.jp 
>>> Cc: 
>>> Date: 2016/10/5, Wed 23:08
>>> Subject: Antw: Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, 
> cluster decisions are delayed infinitely
>>>
>>>>>>  <renayama19661...@ybb.ne.jp> schrieb am 21.09.2016 um 11:52 
>>> in Nachricht
>>> <876439.61305...@web200311.mail.ssk.yahoo.co.jp>:
>>>>  Hi All,
>>>>
>>>>  Was the final conclusion given about this problem?
>>>>
>>>>  If a user uses sbd, can the cluster evade a problem of SIGSTOP of crmd?
>>> As pointed out earlier, maybe crmd should feed a watchdog. Then stopping 
> crmd 
>>> will reboot the node (unless the watchdog fails).
>>>
>>>>  We are interested in this problem, too.
>>>>
>>>>  Best Regards,
>>>>
>>>>  Hideo Yamauchi.
>>>>
>>>>
>>>>  ___
>>>>  Users mailing list: Users@clusterlabs.org 
>>>>  http://clusterlabs.org/mailman/listinfo/users 
>>>>
>>>>  Project Home: http://www.clusterlabs.org 
>>>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>  Bugs: http://bugs.clusterlabs.org 
>> ___
>> Users mailing list: Users@clusterlabs.org 
>> http://clusterlabs.org/mailman/listinfo/users 
>>
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org