On Thu, 2023-02-16 at 11:13 +0100, Adam Cecile wrote: > > On 2/16/23 07:57, Ulrich Windl wrote: > > > > > Adam Cecile <[email protected]> schrieb am 15.02.2023 um > > > > > 10:49 in > > > > Nachricht > > <[email protected]>: > > > Hello, > > > > > > Just had some issue with unexpected server behavior after reboot. > > > This > > > node was powered off, so cluster was running fine with this > > > tomcat9 > > > resource running on a different machine. > > > > > > After powering on this node again, it briefly started tomcat > > > before > > > joining the cluster and decided to stop it again. I'm not sure > > > why. > > > > > > > > > Here is the systemctl status tomcat9 on this host: > > > > > > tomcat9.service - Apache Tomcat 9 Web Application Server > > > Loaded: loaded (/lib/systemd/system/tomcat9.service; > > > disabled; > > > vendor preset: enabled) > > > Drop-In: /etc/systemd/system/tomcat9.service.d > > > └─override.conf > > > Active: inactive (dead) > > > Docs: https://tomcat.apache.org/tomcat-9.0-doc/index.html > > > > > > > > > Feb 15 09:43:27 server tomcat9[1398]: Starting service [Catalina] > > > Feb 15 09:43:27 server tomcat9[1398]: Starting Servlet engine: > > > [Apache > > > Tomcat/9.0.43 (Debian)] > > > Feb 15 09:43:27 server tomcat9[1398]: [...] > > > Feb 15 09:43:29 server systemd[1]: Stopping Apache Tomcat 9 Web > > > Application Server... > > > Feb 15 09:43:29 server systemd[1]: tomcat9.service: Succeeded. > > > Feb 15 09:43:29 server systemd[1]: Stopped Apache Tomcat 9 Web > > > Application Server. > > > Feb 15 09:43:29 server systemd[1]: tomcat9.service: Consumed > > > 8.017s CPU > > > time. > > > > > > You can see it is disabled and should NOT be started with the > > > same, > > > start/stop is under Corosync control > > > > > > > > > The systemd resource is defined like this: > > > > > > primitive tomcat9 systemd:tomcat9.service \ > > > op start interval=0 timeout=120 \ > > > op stop interval=0 timeout=120 \ > > > op monitor interval=60 timeout=100 > > > > > > > > > Any idea why this happened ? > > > > Your journal (syslog) should tell you! > > Indeed, I overlooked yesterday... But it says it's pacemaker that > decided to start it: > > Feb 15 09:43:26 server3 corosync[568]: [QUORUM] Sync members[3]: 1 > 2 3 > Feb 15 09:43:26 server3 corosync[568]: [QUORUM] Sync joined[2]: 1 2 > Feb 15 09:43:26 server3 corosync[568]: [TOTEM ] A new membership > (1.42d) was formed. Members joined: 1 2 > Feb 15 09:43:26 server3 pacemaker-attrd[860]: notice: Node server1 > state is now member > Feb 15 09:43:26 server3 pacemaker-based[857]: notice: Node server1 > state is now member > Feb 15 09:43:26 server3 corosync[568]: [QUORUM] This node is within > the primary component and will provide service. > Feb 15 09:43:26 server3 corosync[568]: [QUORUM] Members[3]: 1 2 3 > Feb 15 09:43:26 server3 corosync[568]: [MAIN ] Completed service > synchronization, ready to provide service. > Feb 15 09:43:26 server3 pacemaker-controld[862]: notice: Quorum > acquired > Feb 15 09:43:26 server3 pacemaker-controld[862]: notice: Node > server1 state is now member > Feb 15 09:43:26 server3 pacemaker-controld[862]: notice: Node > server2 state is now member > Feb 15 09:43:26 server3 pacemaker-based[857]: notice: Node server2 > state is now member > Feb 15 09:43:26 server3 pacemaker-controld[862]: notice: Transition > 0 aborted: Peer Halt > Feb 15 09:43:26 server3 pacemaker-fenced[858]: notice: Node server1 > state is now member > Feb 15 09:43:26 server3 pacemaker-controld[862]: warning: Another DC > detected: server2 (op=noop) > Feb 15 09:43:26 server3 pacemaker-fenced[858]: notice: Node server2 > state is now member > Feb 15 09:43:26 server3 pacemaker-controld[862]: notice: State > transition S_ELECTION -> S_RELEASE_DC > Feb 15 09:43:26 server3 pacemaker-controld[862]: warning: Cancelling > timer for action 12 (src=67) > Feb 15 09:43:26 server3 pacemaker-controld[862]: notice: No need to > invoke the TE (A_TE_HALT) in state S_RELEASE_DC > Feb 15 09:43:26 server3 pacemaker-attrd[860]: notice: Node server2 > state is now member > Feb 15 09:43:26 server3 pacemaker-controld[862]: notice: State > transition S_PENDING -> S_NOT_DC > Feb 15 09:43:27 server3 pacemaker-attrd[860]: notice: Setting > #attrd-protocol[server1]: (unset) -> 2 > Feb 15 09:43:27 server3 pacemaker-attrd[860]: notice: Detected > another attribute writer (server2), starting new election > Feb 15 09:43:27 server3 pacemaker-attrd[860]: notice: Setting > #attrd-protocol[server2]: (unset) -> 2 > Feb 15 09:43:27 server3 IPaddr2(Shared-IPv4)[1258]: INFO: > Feb 15 09:43:27 server3 ntpd[602]: Listen normally on 8 eth0 > 10.13.68.12:123 > Feb 15 09:43:27 server3 ntpd[602]: new interface(s) found: waking up > resolver > => Feb 15 09:43:28 server3 pacemaker-controld[862]: notice: Result > of start operation for tomcat9 on server3: ok > Feb 15 09:43:29 server3 corosync[568]: [KNET ] pmtud: PMTUD link > change for host: 2 link: 0 from 485 to 1397 > Feb 15 09:43:29 server3 corosync[568]: [KNET ] pmtud: PMTUD link > change for host: 1 link: 0 from 485 to 1397 > Feb 15 09:43:29 server3 corosync[568]: [KNET ] pmtud: Global data > MTU changed to: 1397 > => Feb 15 09:43:29 server3 pacemaker-controld[862]: notice: > Requesting local execution of stop operation for tomcat9 on server3 > > Any idea ?
What do the logs on the other node say over the same time frame? -- Ken Gaillot <[email protected]> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
