Re: [Pacemaker] Re: Problems when DC node is STONITH'ed.

Satomi TANIGUCHI Tue, 21 Oct 2008 01:35:54 -0700

Hi Dejan,


Dejan Muhamedagic wrote:

Hi Satomi-san,

On Thu, Oct 16, 2008 at 03:43:36PM +0900, Satomi TANIGUCHI wrote:

Hi Dejan,


Dejan Muhamedagic wrote:

Hi Satomi-san,

On Tue, Oct 14, 2008 at 07:07:00PM +0900, Satomi TANIGUCHI wrote:

Hi,

I found that there are 2 problems when DC node is STONITH'ed.
(1) STONITH operation is executed two times.

This has been discussed at length in bugzilla, see

http://developerbugs.linux-foundation.org/show_bug.cgi?id=1904

which was resolved with WONTFIX. In short, it was deemed to risky
to implement a remedy for this problem.  Of course, if you think
you can add more to the discussion, please go ahead.

Sorry, I missed it.


Well, you couldn't have known about it :)

Thank you for your pointing!
I understand how it came about.

Ideally, when DC-node is going to be STONITH'ed,
the new DC-node is elected and it STONITHs the ex-DC,
then these problems will not occur.
But maybe it is not good way from the viewpoint of emergency
because the ex-DC should be STONITH'ed as soon as possible.


Yes, you're right about this.

Anyway, I understand this is an expected behavior, thanks!
But then, it seems that tengine has to keep having a timeout for waiting
stonithd's result, and long cluster-delay is still required.


If I understood Andrew correctly, the tengine will wait forever,
until stonithd sends a message. Or dies which, let's hope, won't
happen.

My perception is the same as you.

Because second STONITH is requested on that transition timeout.
I'm afraid that I misunderstood the true meaning of what Andrew said.


In the bugzilla? If so, please reopen and voice your concerns.

I asked him again in bugzilla, thanks!

(2) Timeout-value which stonithd on DC node waits to reply
    the result of STONITH op from other node is
    always set to "stonith-timeout" in <cluster_property_set>.
[...]
The case (2):
When this timeout occurs on stonithd on DC
during non-DC node's stonithd tries to reset DC,
DC-stonithd will send a request to other node,
and two or more STONITH plugins are executed in parallel.
This is a troublesome problem.
The most suitable value as this timeout might be
the sum total of "stonith-timeout" of STONITH plugins on the node
which is going to receive the STONITH request from DC node, I think.

This would probably be very difficult for the CRM to get.

Right, I agree with you.
I meant "it is difficult because stonithd on DC can't know the values of
stonith-timeout on other node." with the following sentence
"But DC node can't know that...".

But DC node can't know that...
I would like to hear your opinions.

Sorry, but I couldn't exactly follow. Could you please describe
it in terms of actions.

Sorry, I restate what I meant.
The timeout which stonithd on DC waits for the return of other node's
stonithd needs the value that is longer than the sum total of "stonith-timeout"
of STONITH plugins on the node by all rights.
But it is so difficult to get the values for DC-stonithd.
Then I would like to hear your opinion about what is suitable and practical
value as this timeout which is set in insert_into_executing_queue().
I hope I conveyed to you what I want to say.


OK, I suppose I understand now. You're talking about the timeouts
for remote fencing operations, right? And the originating

Exactly!

stonithd hasn't got a clue on how long the remote fencing
operation may take. Well, that could be a problem. I can't think
of anything to resolve that completely, not without "rewiring"
stonithd. stonithd broadcasts the request so there's no way for
it to know who's doing what and when and how long it can take.

The only workaround I can think of is to use the global (cluster
property) stonith-timeout which should be set to the maximum sum
of stonith timeouts for a node.

All right.
I misunderstood the role of the global stonith-timeout.
I considered it just the default value for each plugin's stonith-timeout
as if default-action-timeout is for each operation.
To use stonith-timeout correctly (without troublesome timeouts),
we should keep the following, right?
 - set stonith-timeout for every STONITH plugin.
 - set the global stonith-timeout to the maximum sum of stonith timeouts
   for a node.
 - (set cluster-delay to longer than global stonith-timeout,
    at least at present.)


Now, back to reality ;-)  Timeouts are important, of course, but
one should usually leave a generous margin on top of the expected
duration. For instance, if the normal timeout for an operation on
a device is 30 seconds, there's nothing wrong in setting it to
say one or two minutes. The consequences of an operation ending
prematurely are much more serious than if one waits a bit longer.
After all, if there's something really wrong, it is usually
detected early and the error reported immediately. Of course,
one shouldn't follow this advice blindly. Know your cluster!

Understood!

For reference, I attached logs when the aforesaid timeout occurs.
The cluster has 3 nodes.
When DC was going to be STONITH'ed, DC sent a request all of non-DC nodes,
and all of them tried to shutdown DC.


No, the tengine (running on DC) always talks to the local
stonithd.

I meant "stontihd on DC broadcast a request"
with the sentence "DC sent a request all of non-DC nodes".
I'm sorry for being ambiguous.

And the timeout on DC-stonithd occured, DC-stonithd sent the same request,
then two or more STONITH plugin worked in parallel on every non-DC node.
(Please see sysstats.txt.)

I want to make clear whether the current behavior is expected or a bug.


That's actually wrong, but could be considered a configuration
problem:

  <cluster_property_set id="cib-bootstrap-options">
  ...
        <nvpair id="nvpair.id2000009" name="stonith-timeout" value="260s"/>
  ...
  <primitive id="prmStonithN1" class="stonith" type="external/ssh">
  ...
          <nvpair id="nvpair.id2000602" name="stonith-timeout" value="390s"/>

The stonithd initiator (the one running on the DC) times out
before the remote fencing operation. On retry a second remote
fencing operation is started. That's why you see two of them.

I set these values because I wanted to know what would be happen
when the timeout for remote fencing op occurs, and I intended to talk about it
on MailingList if curious behavior appears. ;)


Anyway, you can open a bugzilla for this, because the stonithd on
a remote host should know that there's already one operation
running. Unfortunately, I'm busy with more urgent matters right
now, so it may take a few weeks until I take a look at it.
As usual, patches are welcome :)

I posted into bugzilla.
http://developerbugs.linux-foundation.org/show_bug.cgi?id=1983
I'm sorry to bother you.


Best Regards,
SatomiTANIGUCHI


Thanks,

Dejan

But I consider that the root of every problem is the node which sends STONITH
request and wait for completion of the op is killed.


Regards,
Satomi TANIGUCHI

Thanks,

Dejan

Best Regards,
Satomi TANIGUCHI

_______________________________________________________
Linux-HA-Dev: [EMAIL PROTECTED]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

_______________________________________________________
Linux-HA-Dev: [EMAIL PROTECTED]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker



_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker



_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] Re: Problems when DC node is STONITH'ed.

Reply via email to