On 2/16/23 20:54, Ken Gaillot wrote:
On Thu, 2023-02-16 at 11:13 +0100, Adam Cecile wrote:
On 2/16/23 07:57, Ulrich Windl wrote:
Adam Cecile<[email protected]>  schrieb am 15.02.2023 um
10:49 in
Nachricht
<[email protected]>:
Hello,

Just had some issue with unexpected server behavior after reboot.
This
node was powered off, so cluster was running fine with this
tomcat9
resource running on a different machine.

After powering on this node again, it briefly started tomcat
before
joining the cluster and decided to stop it again. I'm not sure
why.


Here is the systemctl status tomcat9 on this host:

tomcat9.service - Apache Tomcat 9 Web Application Server
       Loaded: loaded (/lib/systemd/system/tomcat9.service;
disabled;
vendor preset: enabled)
      Drop-In: /etc/systemd/system/tomcat9.service.d
               └─override.conf
       Active: inactive (dead)
         Docs:https://tomcat.apache.org/tomcat-9.0-doc/index.html
Feb 15 09:43:27 server tomcat9[1398]: Starting service [Catalina]
Feb 15 09:43:27 server tomcat9[1398]: Starting Servlet engine:
[Apache
Tomcat/9.0.43 (Debian)]
Feb 15 09:43:27 server tomcat9[1398]: [...]
Feb 15 09:43:29 server systemd[1]: Stopping Apache Tomcat 9 Web
Application Server...
Feb 15 09:43:29 server systemd[1]: tomcat9.service: Succeeded.
Feb 15 09:43:29 server systemd[1]: Stopped Apache Tomcat 9 Web
Application Server.
Feb 15 09:43:29 server systemd[1]: tomcat9.service: Consumed
8.017s CPU
time.

You can see it is disabled and should NOT be started with the
same,
start/stop is under Corosync control


The systemd resource is defined like this:

primitive tomcat9 systemd:tomcat9.service \
          op start interval=0 timeout=120 \
          op stop interval=0 timeout=120 \
          op monitor interval=60 timeout=100


Any idea why this happened ?
Your journal (syslog) should tell you!
Indeed, I overlooked yesterday... But it says it's pacemaker that
decided to start it:

Feb 15 09:43:26 server3 corosync[568]:   [QUORUM] Sync members[3]: 1
2 3
Feb 15 09:43:26 server3 corosync[568]:   [QUORUM] Sync joined[2]: 1 2
Feb 15 09:43:26 server3 corosync[568]:   [TOTEM ] A new membership
(1.42d) was formed. Members joined: 1 2
Feb 15 09:43:26 server3 pacemaker-attrd[860]:  notice: Node server1
state is now member
Feb 15 09:43:26 server3 pacemaker-based[857]:  notice: Node server1
state is now member
Feb 15 09:43:26 server3 corosync[568]:   [QUORUM] This node is within
the primary component and will provide service.
Feb 15 09:43:26 server3 corosync[568]:   [QUORUM] Members[3]: 1 2 3
Feb 15 09:43:26 server3 corosync[568]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Feb 15 09:43:26 server3 pacemaker-controld[862]:  notice: Quorum
acquired
Feb 15 09:43:26 server3 pacemaker-controld[862]:  notice: Node
server1 state is now member
Feb 15 09:43:26 server3 pacemaker-controld[862]:  notice: Node
server2 state is now member
Feb 15 09:43:26 server3 pacemaker-based[857]:  notice: Node server2
state is now member
Feb 15 09:43:26 server3 pacemaker-controld[862]:  notice: Transition
0 aborted: Peer Halt
Feb 15 09:43:26 server3 pacemaker-fenced[858]:  notice: Node server1
state is now member
Feb 15 09:43:26 server3 pacemaker-controld[862]:  warning: Another DC
detected: server2 (op=noop)
Feb 15 09:43:26 server3 pacemaker-fenced[858]:  notice: Node server2
state is now member
Feb 15 09:43:26 server3 pacemaker-controld[862]:  notice: State
transition S_ELECTION -> S_RELEASE_DC
Feb 15 09:43:26 server3 pacemaker-controld[862]:  warning: Cancelling
timer for action 12 (src=67)
Feb 15 09:43:26 server3 pacemaker-controld[862]:  notice: No need to
invoke the TE (A_TE_HALT) in state S_RELEASE_DC
Feb 15 09:43:26 server3 pacemaker-attrd[860]:  notice: Node server2
state is now member
Feb 15 09:43:26 server3 pacemaker-controld[862]:  notice: State
transition S_PENDING -> S_NOT_DC
Feb 15 09:43:27 server3 pacemaker-attrd[860]:  notice: Setting
#attrd-protocol[server1]: (unset) -> 2
Feb 15 09:43:27 server3 pacemaker-attrd[860]:  notice: Detected
another attribute writer (server2), starting new election
Feb 15 09:43:27 server3 pacemaker-attrd[860]:  notice: Setting
#attrd-protocol[server2]: (unset) -> 2
Feb 15 09:43:27 server3 IPaddr2(Shared-IPv4)[1258]: INFO:
Feb 15 09:43:27 server3 ntpd[602]: Listen normally on 8 eth0
10.13.68.12:123
Feb 15 09:43:27 server3 ntpd[602]: new interface(s) found: waking up
resolver
=> Feb 15 09:43:28 server3 pacemaker-controld[862]:  notice: Result
of start operation for tomcat9 on server3: ok
Feb 15 09:43:29 server3 corosync[568]:   [KNET  ] pmtud: PMTUD link
change for host: 2 link: 0 from 485 to 1397
Feb 15 09:43:29 server3 corosync[568]:   [KNET  ] pmtud: PMTUD link
change for host: 1 link: 0 from 485 to 1397
Feb 15 09:43:29 server3 corosync[568]:   [KNET  ] pmtud: Global data
MTU changed to: 1397
=> Feb 15 09:43:29 server3 pacemaker-controld[862]:  notice:
Requesting local execution of stop operation for tomcat9 on server3

Any idea ?
What do the logs on the other node say over the same time frame?

Here is the logs from server2, same period:

Feb 15 09:43:26 server2 corosync[524]:   [KNET  ] rx: host: 3 link: 0 is up
Feb 15 09:43:26 server2 corosync[524]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 15 09:43:26 server2 corosync[524]:   [QUORUM] Sync members[3]: 1 2 3
Feb 15 09:43:26 server2 corosync[524]:   [QUORUM] Sync joined[1]: 3
Feb 15 09:43:26 server2 corosync[524]:   [TOTEM ] A new membership (1.42d) was formed. Members joined: 3 Feb 15 09:43:26 server2 pacemaker-based[812]:  notice: Node server3. state is now member Feb 15 09:43:26 server2 pacemaker-fenced[813]:  notice: Node server3. state is now member Feb 15 09:43:26 server2 pacemaker-attrd[815]:  notice: Node server3. state is now member
Feb 15 09:43:26 server2 corosync[524]:   [QUORUM] Members[3]: 1 2 3
Feb 15 09:43:26 server2 corosync[524]:   [MAIN  ] Completed service synchronization, ready to provide service. Feb 15 09:43:26 server2 pacemaker-controld[817]:  notice: Node server3. state is now member Feb 15 09:43:26 server2 pacemaker-controld[817]:  notice: State transition S_IDLE -> S_INTEGRATION Feb 15 09:43:26 server2 pacemaker-controld[817]:  warning: Another DC detected: server3. (op=noop) Feb 15 09:43:26 server2 pacemaker-controld[817]:  notice: State transition S_ELECTION -> S_INTEGRATION Feb 15 09:43:27 server2 pacemaker-attrd[815]:  notice: Detected another attribute writer (server3.), starting new election Feb 15 09:43:27 server2 pacemaker-attrd[815]:  notice: Setting #attrd-protocol[server3.]: (unset) -> 2 Feb 15 09:43:27 server2 pacemaker-attrd[815]:  notice: Recorded local node as attribute writer (was unset) Feb 15 09:43:27 server2 pacemaker-schedulerd[816]:  notice: On loss of quorum: Ignore Feb 15 09:43:27 server2 pacemaker-schedulerd[816]:  error: Resource Shared-IPv4 is active on 2 nodes (attempting recovery) Feb 15 09:43:27 server2 pacemaker-schedulerd[816]:  notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information Feb 15 09:43:27 server2 pacemaker-schedulerd[816]:  notice:  * Restart    Shared-IPv4      ( server1. ) Feb 15 09:43:27 server2 pacemaker-schedulerd[816]:  notice:  * Restart    tomcat9          ( server1. )   due to required Shared-IPv4 start Feb 15 09:43:27 server2 pacemaker-schedulerd[816]:  error: Calculated transition 767 (with errors), saving inputs in /var/lib/pacemaker/pengine/pe-error-5.bz2 Feb 15 09:43:27 server2 pacemaker-controld[817]:  notice: Initiating stop operation tomcat9_stop_0 on server1. Feb 15 09:43:28 server2 pacemaker-controld[817]:  notice: Transition 767 aborted by operation tomcat9_start_0 'modify' on server3.: Foreign event Feb 15 09:43:29 server2 pacemaker-controld[817]:  notice: Transition 767 (Complete=2, Pending=0, Fired=0, Skipped=2, Incomplete=9, Source=/var/lib/pacemaker/pengine/pe-error-5.bz2): Stopped Feb 15 09:43:29 server2 pacemaker-schedulerd[816]:  notice: On loss of quorum: Ignore Feb 15 09:43:29 server2 pacemaker-schedulerd[816]:  error: Resource Shared-IPv4 is active on 2 nodes (attempting recovery) Feb 15 09:43:29 server2 pacemaker-schedulerd[816]:  notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information Feb 15 09:43:29 server2 pacemaker-schedulerd[816]:  notice:  * Restart    Shared-IPv4      ( server1. ) Feb 15 09:43:29 server2 pacemaker-schedulerd[816]:  notice:  * Move       tomcat9          ( server3. -> server1. ) Feb 15 09:43:29 server2 pacemaker-schedulerd[816]:  error: Calculated transition 768 (with errors), saving inputs in /var/lib/pacemaker/pengine/pe-error-6.bz2 Feb 15 09:43:29 server2 pacemaker-controld[817]:  notice: Initiating stop operation tomcat9_stop_0 on server3. Feb 15 09:43:31 server2 pacemaker-controld[817]:  notice: Initiating stop operation Shared-IPv4_stop_0 on server3. Feb 15 09:43:31 server2 pacemaker-controld[817]:  notice: Initiating stop operation Shared-IPv4_stop_0 on server1. Feb 15 09:43:31 server2 pacemaker-controld[817]:  notice: Initiating start operation Shared-IPv4_start_0 on server1. Feb 15 09:43:31 server2 pacemaker-controld[817]:  notice: Initiating monitor operation Shared-IPv4_monitor_60000 on server1. Feb 15 09:43:31 server2 pacemaker-controld[817]:  notice: Initiating start operation tomcat9_start_0 on server1. Feb 15 09:43:33 server2 pacemaker-controld[817]:  notice: Initiating monitor operation tomcat9_monitor_60000 on server1. Feb 15 09:43:33 server2 pacemaker-controld[817]:  notice: Transition 768 (Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-error-6.bz2): Complete Feb 15 09:43:33 server2 pacemaker-controld[817]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE Feb 15 09:43:38 server2 pacemaker-based[812]:  notice: Local CIB 0.206.53.01293e9ed5a687db873d1ddd384803e2 differs from server3.: 0.206.51.411962ae9850d59bc1bbf4728b9e89b2 0x55f23d164cb0
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to