Re: [ClusterLabs] why is node fenced ?

2019-08-14 Thread Ken Gaillot
On Wed, 2019-08-14 at 11:57 +0200, Lentes, Bernd wrote:
> 
> - On Aug 13, 2019, at 1:19 AM, kgaillot kgail...@redhat.com
> wrote:
> 
> 
> > 
> > The key messages are:
> > 
> > Aug 09 17:43:27 [6326] ha-idg-1   crmd: info:
> > crm_timer_popped: Election
> > Trigger (I_DC_TIMEOUT) just popped (2ms)
> > Aug 09 17:43:27 [6326] ha-idg-1   crmd:  warning:
> > do_log:   Input
> > I_DC_TIMEOUT received in state S_PENDING from crm_timer_popped
> > 
> > That indicates the newly rebooted node didn't hear from the other
> > node
> > within 20s, and so assumed it was dead.
> > 
> > The new node had quorum, but never saw the other node's corosync,
> > so
> > I'm guessing you have two_node and/or wait_for_all disabled in
> > corosync.conf, and/or you have no-quorum-policy=ignore in
> > pacemaker.
> > 
> > I'd recommend two_node: 1 in corosync.conf, with no explicit
> > wait_for_all or no-quorum-policy setting. That would ensure a
> > rebooted/restarted node doesn't get initial quorum until it has
> > seen
> > the other node.
> 
> That's my setting:
> 
> expected_votes: 2
>   two_node: 1
>   wait_for_all: 0
> 
> no-quorum-policy=ignore
> 
> I did that because i want be able to start the cluster although one
> node has e.g. a hardware problem.
> Is that ok ?

Well that's why you're seeing what you're seeing, which is also why
wait_for_all was created :)

You definitely don't need no-quorum-policy=ignore in any case. With
two_node, corosync will continue to provide quorum to pacemaker when
one node goes away, so from pacemaker's view no-quorum-policy never
kicks in.

With wait_for_all enabled, the newly joining node wouldn't get quorum
initially, so it wouldn't fence the other node. So that's the trade-
off, preventing this situation vs being able to start one node alone
intentionally. Personally, I'd leave wait_for_all on normally, and
manually change it to 0 whenever I was intentionally taking one node
down for an extended time.

Of course all of that is just recovery, and doesn't explain why the
nodes can't see each other to begin with.

> 
> 
> Bernd
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling
> Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep,
> Heinrich Bassler, Kerstin Guenther
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Q: "Re-initiated expired calculated failure"

2019-08-14 Thread Ken Gaillot
On Wed, 2019-08-14 at 10:24 +0200, Ulrich Windl wrote:
> (subject changed for existing thread)
> 
> Hi!
> 
> After I had thought the problem with the sticky failed monitor was
> solved
> eventually, I realized that I'm getting a message that I don't really
> understand after each cluster recheck interval:
> 
> pengine[7280]:   notice: Re-initiated expired calculated failure
> prm_nfs_server_monitor_6 (rc=7,
> magic=0:7;4:6568:0:d941efc1-de73-4ee4-b593-f65be9e90726) on h11
> 
> The message repeats absolutely identical. So what does it mean? The 

That one confuses me too.

An expired failure is simply ignored for non-recurring operations. But
for expired failures of a recurring monitor, if the node is up, the
monitor's restart digest is altered, which I believe causes it to be
cancelled and re-scheduled.

The reason in the commit message was "This is particularly relevant for
those with on-fail=block which stick around and are not cleaned up by a
subsequent stop/start."

I don't claim to understand it. :)

> monitor
> did not fail between cluster rechecks, and crm_mon is not displaying
> any failed
> operations.

Probably because it's expired. A clean-up should still get rid of it,
though.

> 
> Regards,
> Ulrich
> 
> 
> > > > Ulrich Windl schrieb am 13.08.2019 um 11:06 in Nachricht
> > > > <5D527D91.124 :
> 
> 161 :
> 60728>:
> > Hi,
> > 
> > an update:
> > After setting a failure-timeout for the resource that stale monitor
> > failure
> > was removed automatically at next cluster recheck (it seems).
> > Still I wonder why a resource cleanup didn't do that (bug?).
> > 
> > Regards,
> > Ulrich
> > 
> > 
> > > > > "Ulrich Windl"  schrieb am
> > > > > 13.08.2019
> 
> um
> > 10:07 in Nachricht <
> > 5d526fb002a100032...@gwsmtp.uni-regensburg.de>:
> > > > > > Ken Gaillot  schrieb am 13.08.2019 um
> > > > > > 01:03 in
> > > 
> > > Nachricht
> > > :
> > > > On Mon, 2019‑08‑12 at 17:46 +0200, Ulrich Windl wrote:
> > > > > Hi!
> > > > > 
> > > > > I just noticed that a "crm resource cleanup " caused
> > > > > some
> > > > > unexpected behavior and the syslog message:
> > > > > crmd[7281]:  warning: new_event_notification (7281‑97955‑15):
> > > > > Broken
> > > > > pipe (32)
> > > > > 
> > > > > It's SLES14 SP4 last updated Sept. 2018 (up since then,
> > > > > pacemaker‑
> > > > > 1.1.19+20180928.0d2680780‑1.8.x86_64).
> > > > > 
> > > > > The cleanup was due to a failed monitor. As an unexpected
> > > > > consequence
> > > > > of this cleanup, CRM seemed to restart the complete resource
> > > > > (and
> > > > > dependencies), even though it was running.
> > > > 
> > > > I assume the monitor failure was old, and recovery had already
> > > > completed? If not, recovery might have been initiated before
> > > > the clean‑
> > > > up was recorded.
> > > > 
> > > > > I noticed that a manual "crm_resource ‑C ‑r  ‑N "
> > > > > command
> > > > > has the same effect (multiple resources are "Cleaned up",
> > > > > resources
> > > > > are restarted seemingly before the "probe" is done.).
> > > > 
> > > > Can you verify whether the probes were done? The DC should log
> > > > a
> > > > message when each _monitor_0 result comes in.
> > > 
> > > So here's a rough sketch of events:
> > > 17:10:23 crmd[7281]:   notice: State transition S_IDLE ->
> > > S_POLICY_ENGINE
> > > ...no probes yet...
> > > 17:10:24 pengine[7280]:  warning: Processing failed monitor of 
> > > prm_nfs_server
> > > on rksaph11: not running
> > > ...lots of starts/restarts...
> > > 17:10:24 pengine[7280]:   notice:  * Restartprm_nfs_server  
> > > ...
> > > 17:10:24 crmd[7281]:   notice: Processing graph 6628
> > > (ref=pe_calc-dc-1565622624-7313) derived from
> > > /var/lib/pacemaker/pengine/pe-input-1810.bz2
> > > ...monitors are being called...
> > > 17:10:24 crmd[7281]:   notice: Result of probe operation for
> > > prm_nfs_vg
> 
> on
> > > h11: 0 (ok)
> > > ...the above was the first probe result...
> > > 17:10:24 crmd[7281]:  warning: Action 33 (prm_nfs_vg_monitor_0)
> > > on h11 
> > > failed
> > > (target: 7 vs. rc: 0): Error
> > > ...not surprising to me: The resource was running; I don't know
> > > why the
> > > cluster want to start it...
> > > 17:10:24 crmd[7281]:   notice: Transition 6629 (Complete=9,
> > > Pending=0,
> > > Fired=0, Skipped=0, Incomplete=0,
> > > Source=/var/lib/pacemaker/pengine/pe-input-1811.bz2): Complete
> > > 17:10:24 crmd[7281]:   notice: State transition
> > > S_TRANSITION_ENGINE -> 
> > 
> > S_IDLE
> > > 
> > > The really bad thing after this is that the "cleaned up" resource
> > > still
> 
> has 
> > > a
> > > failed status (dated in the past (last-rc-change='Mon Aug 12
> > > 04:52:23 
> > > 2019')),
> > > even though "running".
> > > 
> > > I tend to believe that the cluster is in a bad state, or the
> > > software has
> 
> a
> > > problem cleaning the status of the monitor.
> > > 
> > > The CIB status for the resource looks like this:
> > >  > > class="ocf"
> > > provider="heartbeat">
> > >   

[ClusterLabs] [Announce] clufter v0.77.2 released

2019-08-14 Thread Jan Pokorný
I am happy to announce that clufter, a tool/library for transforming
and analyzing cluster configuration formats, got its version 0.77.2
tagged and released (incl. signature using my 60BCBB4F5CD7F9EF key):


or alternative (original) location:



The updated test suite for this version is also provided:

or alternatively:


I am not so happy that this is only limited to bare minimum to get
clufter working with upcoming Python 3.8 (appears rather aggressive
about compatibility, even if some of that are just enforcements of
previous deprecations) plus some small accrued changes over time,
but there was no room to deliver more since the last release.
Quite some catching up with the recent developments, as also asked on
this list[1], is pending, hopefully this will get rectified soon.

[1] https://lists.clusterlabs.org/pipermail/users/2019-July/026057.html


Changelog highlights for v0.77.2 (also available as a tag message):

- Python 3 (3.8 in particular) compatibility improving release

- enhancements:
  . knowledge about mapping various platforms to particular cluster
component sets got updated so as to target these more reliably
-- note however that current capacity for package maintenance
does not allow for adding support for new evolutions of such
components, even when they are nominally recognized like that
(mostly a concern regarding corosync3/kronosnet, and new and
backward incompatible changes in pcs)
  . specfile received more care regarding using precisely qualified
Python interpreters and process of Python byte-compilation was
taken fully under the explicit control of where familiarity
for that is established
- internal enhancements:
  . previously introduced text/data separation to align with Python 3
regression in the test suite was rectified
  . mutliple newly identified issues with Python 3.8 were fixed
(deprecated and dropped standard library objects swapped for
the straightforward replacements, newly imposed metaclass
and relative module import related constraints were reflected)
  . (automatically reported) resource (open file descriptor) leak
was resolved

* * *

The public repository (notably master and next branches) is currently at

(rather than ).

Official, signed releases can be found at
 or, alternatively, at

(also beware, automatic git archives preserve a "dev structure").

Natively packaged in Fedora (python3-clufter, clufter-cli, ...).

Issues & suggestions can be reported at either of (regardless if Fedora)
,
.


Happy clustering/high-availing :)

-- 
Jan (Poki)


pgpKgJJYweR5p.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] why is node fenced ?

2019-08-14 Thread Lentes, Bernd



- On Aug 13, 2019, at 1:19 AM, kgaillot kgail...@redhat.com wrote:


> 
> The key messages are:
> 
> Aug 09 17:43:27 [6326] ha-idg-1   crmd: info: crm_timer_popped: 
> Election
> Trigger (I_DC_TIMEOUT) just popped (2ms)
> Aug 09 17:43:27 [6326] ha-idg-1   crmd:  warning: do_log:   Input
> I_DC_TIMEOUT received in state S_PENDING from crm_timer_popped
> 
> That indicates the newly rebooted node didn't hear from the other node
> within 20s, and so assumed it was dead.
> 
> The new node had quorum, but never saw the other node's corosync, so
> I'm guessing you have two_node and/or wait_for_all disabled in
> corosync.conf, and/or you have no-quorum-policy=ignore in pacemaker.
> 
> I'd recommend two_node: 1 in corosync.conf, with no explicit
> wait_for_all or no-quorum-policy setting. That would ensure a
> rebooted/restarted node doesn't get initial quorum until it has seen
> the other node.

That's my setting:

expected_votes: 2
  two_node: 1
  wait_for_all: 0

no-quorum-policy=ignore

I did that because i want be able to start the cluster although one node has 
e.g. a hardware problem.
Is that ok ?


Bernd
 

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling
Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, 
Kerstin Guenther
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Q: "Re-initiated expired calculated failure"

2019-08-14 Thread Ulrich Windl
(subject changed for existing thread)

Hi!

After I had thought the problem with the sticky failed monitor was solved
eventually, I realized that I'm getting a message that I don't really
understand after each cluster recheck interval:

pengine[7280]:   notice: Re-initiated expired calculated failure
prm_nfs_server_monitor_6 (rc=7,
magic=0:7;4:6568:0:d941efc1-de73-4ee4-b593-f65be9e90726) on h11

The message repeats absolutely identical. So what does it mean? The monitor
did not fail between cluster rechecks, and crm_mon is not displaying any failed
operations.

Regards,
Ulrich


>>> Ulrich Windl schrieb am 13.08.2019 um 11:06 in Nachricht <5D527D91.124 :
161 :
60728>:
> Hi,
> 
> an update:
> After setting a failure-timeout for the resource that stale monitor failure

> was removed automatically at next cluster recheck (it seems).
> Still I wonder why a resource cleanup didn't do that (bug?).
> 
> Regards,
> Ulrich
> 
> 
> >>> "Ulrich Windl"  schrieb am 13.08.2019
um
> 10:07 in Nachricht <5d526fb002a100032...@gwsmtp.uni-regensburg.de>:
>  Ken Gaillot  schrieb am 13.08.2019 um 01:03 in
> > Nachricht
> > :
> >> On Mon, 2019‑08‑12 at 17:46 +0200, Ulrich Windl wrote:
> >>> Hi!
> >>> 
> >>> I just noticed that a "crm resource cleanup " caused some
> >>> unexpected behavior and the syslog message:
> >>> crmd[7281]:  warning: new_event_notification (7281‑97955‑15): Broken
> >>> pipe (32)
> >>> 
> >>> It's SLES14 SP4 last updated Sept. 2018 (up since then, pacemaker‑
> >>> 1.1.19+20180928.0d2680780‑1.8.x86_64).
> >>> 
> >>> The cleanup was due to a failed monitor. As an unexpected consequence
> >>> of this cleanup, CRM seemed to restart the complete resource (and
> >>> dependencies), even though it was running.
> >> 
> >> I assume the monitor failure was old, and recovery had already
> >> completed? If not, recovery might have been initiated before the clean‑
> >> up was recorded.
> >> 
> >>> I noticed that a manual "crm_resource ‑C ‑r  ‑N " command
> >>> has the same effect (multiple resources are "Cleaned up", resources
> >>> are restarted seemingly before the "probe" is done.).
> >> 
> >> Can you verify whether the probes were done? The DC should log a
> >> message when each _monitor_0 result comes in.
> > 
> > So here's a rough sketch of events:
> > 17:10:23 crmd[7281]:   notice: State transition S_IDLE -> S_POLICY_ENGINE
> > ...no probes yet...
> > 17:10:24 pengine[7280]:  warning: Processing failed monitor of 
> > prm_nfs_server
> > on rksaph11: not running
> > ...lots of starts/restarts...
> > 17:10:24 pengine[7280]:   notice:  * Restartprm_nfs_server  
> > ...
> > 17:10:24 crmd[7281]:   notice: Processing graph 6628
> > (ref=pe_calc-dc-1565622624-7313) derived from
> > /var/lib/pacemaker/pengine/pe-input-1810.bz2
> > ...monitors are being called...
> > 17:10:24 crmd[7281]:   notice: Result of probe operation for prm_nfs_vg
on
> > h11: 0 (ok)
> > ...the above was the first probe result...
> > 17:10:24 crmd[7281]:  warning: Action 33 (prm_nfs_vg_monitor_0) on h11 
> > failed
> > (target: 7 vs. rc: 0): Error
> > ...not surprising to me: The resource was running; I don't know why the
> > cluster want to start it...
> > 17:10:24 crmd[7281]:   notice: Transition 6629 (Complete=9, Pending=0,
> > Fired=0, Skipped=0, Incomplete=0,
> > Source=/var/lib/pacemaker/pengine/pe-input-1811.bz2): Complete
> > 17:10:24 crmd[7281]:   notice: State transition S_TRANSITION_ENGINE -> 
> S_IDLE
> > 
> > The really bad thing after this is that the "cleaned up" resource still
has 
> > a
> > failed status (dated in the past (last-rc-change='Mon Aug 12 04:52:23 
> > 2019')),
> > even though "running".
> > 
> > I tend to believe that the cluster is in a bad state, or the software has
a
> > problem cleaning the status of the monitor.
> > 
> > The CIB status for the resource looks like this:
> >  > provider="heartbeat">
> >> operation_key="prm_nfs_server_start_0" operation="start"
> > crm-debug-origin="do_update_resource" crm_feature_set="3.0.14"
> > transition-key="67:6583:0:d941efc1-de73-4ee4-b593-f65be9e90726"
> > transition-magic="0:0;67:6583:0:d941efc1-de73-4ee4-b593-f65be9e90726"
> > exit-reason="" on_node="h11" call-id="799" rc-code="0" op-status="0"
> > interval="0" last-run="1565582351" last-rc-change="1565582351" 
> > exec-time="708"
> > queue-time="0" op-digest="73311a0ef4ba8e9f1f97e05e989f6348"/>
> >> operation_key="prm_nfs_server_monitor_6" operation="monitor"
> > crm-debug-origin="do_update_resource" crm_feature_set="3.0.14"
> > transition-key="68:6583:0:d941efc1-de73-4ee4-b593-f65be9e90726"
> > transition-magic="0:0;68:6583:0:d941efc1-de73-4ee4-b593-f65be9e90726"
> > exit-reason="" on_node="h11" call-id="800" rc-code="0" op-status="0"
> > interval="6" last-rc-change="1565582351" exec-time="499"
queue-time="0"
> > op-digest="9d8aa17b2a741c8328d7896459733e56"/>
> >> operation_key="prm_nfs_server_monitor_6" operation="monitor"
> > 

Re: [ClusterLabs] Antw: Re: Antw: Re: why is node fenced ?

2019-08-14 Thread Lentes, Bernd



- On Aug 14, 2019, at 8:25 AM, Ulrich Windl 
ulrich.wi...@rz.uni-regensburg.de wrote:

 
> But why do the eth interfaces on both nodes come up the same second
> (2019-08-09T17:42:19)?
> 

The respective eth's of the two bonds of the two hosts are connected directly 
to each other.
Just a wire, no switch between.
That means when ethx on host1 is up the opposite device on the other host also 
goes online in the same moment.


Bernd
 

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling
Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, 
Kerstin Guenther
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: Re: Antw: Re: why is node fenced ?

2019-08-14 Thread Ulrich Windl
>>> "Lentes, Bernd"  schrieb am 13.08.2019
um
16:03 in Nachricht
<854237493.2026097.1565705038122.javamail.zim...@helmholtz-muenchen.de>:

> 
> - On Aug 13, 2019, at 3:14 PM, Ulrich Windl 
> ulrich.wi...@rz.uni-regensburg.de wrote:
> 
>> You said you booted the hosts sequentially. From the logs they were
starting 
> in
>> parallel.
>> 
> 
> No. last says:

But why do the eth interfaces on both nodes come up the same second
(2019-08-09T17:42:19)?

> ha-idg-1: 
> reboot   system boot  4.12.14-95.29-de Fri Aug  9 17:42 - 15:56 (3+22:14)
> 
> ha-idg-2:
> reboot   system boot  4.12.14-95.29-de Fri Aug  9 18:08 - 15:58 (3+21:49)
> root pts/010.35.34.70  Fri Aug  9 17:24 - crash  (00:44)
> (unknown :0   :0   Fri Aug  9 17:24 - crash  (00:44)
> reboot   system boot  4.12.14-95.29-de Fri Aug  9 17:23 - 15:58 (3+22:34)
> 
>>> This is the initialization of the bond1 on ha‑idg‑1 during boot.
>>> 3 seconds later bond1 is fine:
>>> 
>>> 2019‑08‑09T17:42:19.299886+02:00 ha‑idg‑2 kernel: [ 1232.117470] tg3
>>> :03:04.0 eth2: Link is up at 1000 Mbps, full duplex
>>> 2019‑08‑09T17:42:19.299908+02:00 ha‑idg‑2 kernel: [ 1232.117482] tg3
>>> :03:04.0 eth2: Flow control is on for TX and on for RX
>>> 2019‑08‑09T17:42:19.315756+02:00 ha‑idg‑2 kernel: [ 1232.131565] tg3
>>> :03:04.1 eth3: Link is up at 1000 Mbps, full duplex
>>> 2019‑08‑09T17:42:19.315767+02:00 ha‑idg‑2 kernel: [ 1232.131568] tg3
>>> :03:04.1 eth3: Flow control is on for TX and on for RX
>>> 2019‑08‑09T17:42:19.351781+02:00 ha‑idg‑2 kernel: [ 1232.169386] bond1:
link
>> 
>>> status definitely up for interface eth2, 1000 Mbps full duplex
>>> 2019‑08‑09T17:42:19.351792+02:00 ha‑idg‑2 kernel: [ 1232.169390] bond1:
>> making
>>> interface eth2 the new active one
>>> 2019‑08‑09T17:42:19.352521+02:00 ha‑idg‑2 kernel: [ 1232.169473] bond1:
>> first
>>> active interface up!
>>> 2019‑08‑09T17:42:19.352532+02:00 ha‑idg‑2 kernel: [ 1232.169480] bond1:
link
>> 
>>> status definitely up for interface eth3, 1000 Mbps full duplex
>>> 
>>> also on ha‑idg‑1:
>>> 
>>> 2019‑08‑09T17:42:19.168035+02:00 ha‑idg‑1 kernel: [  110.164250] tg3
>>> :02:00.3 eth3: Link is up at 1000 Mbps, full duplex
>>> 2019‑08‑09T17:42:19.168050+02:00 ha‑idg‑1 kernel: [  110.164252] tg3
>>> :02:00.3 eth3: Flow control is on for TX and on for RX
>>> 2019‑08‑09T17:42:19.168052+02:00 ha‑idg‑1 kernel: [  110.164254] tg3
>>> :02:00.3 eth3: EEE is disabled
>>> 2019‑08‑09T17:42:19.172020+02:00 ha‑idg‑1 kernel: [  110.171378] tg3
>>> :02:00.2 eth2: Link is up at 1000 Mbps, full duplex
>>> 2019‑08‑09T17:42:19.172028+02:00 ha‑idg‑1 kernel: [  110.171380] tg3
>>> :02:00.2 eth2: Flow control is on for TX and on for RX
>>> 2019‑08‑09T17:42:19.172029+02:00 ha‑idg‑1 kernel: [  110.171382] tg3
>>> :02:00.2 eth2: EEE is disabled
>>>  ...
>>> 2019‑08‑09T17:42:19.244066+02:00 ha‑idg‑1 kernel: [  110.240310] bond1:
link
>> 
>>> status definitely up for interface eth2, 1000 Mbps full duplex
>>> 2019‑08‑09T17:42:19.244083+02:00 ha‑idg‑1 kernel: [  110.240311] bond1:
>> making
>>> interface eth2 the new active one
>>> 2019‑0809T17:42:19.244085+02:00 ha‑idg‑1 kernel: [  110.240353] bond1:
>> first
>>> active interface up!
>>> 2019‑08‑09T17:42:19.244087+02:00 ha‑idg‑1 kernel: [  110.240356] bond1:
link
>> 
>>> status definitely up for interface eth3, 1000 Mbps full duplex
>>> 
>>> And the cluster is started afterwards on ha‑idg‑1 at 17:43:04. I don't
find
>> 
>>> further entries for problems with bond1. So i think it's not related.
>>> Time is synchronized by ntp.
> 
> The two bonding devices (bond1) are connected directly (point-to-point).
> So if eth2 or eth3, the ones for the bonding, go online on one host the 
> other host
> sees it directly.
> 
> 
> Bernd
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de 
> Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling
> Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich 
> Bassler, Kerstin Guenther
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/