>>> Gabriele Bulfon <[email protected]> schrieb am 17.12.2020 um 09:11 in Nachricht <2129123894.1061.1608192712316@www>: > Yes, sorry took same bash by mistake...here are the correct logs. > > Yes, xstha1 has delay 10s so that I'm giving him precedence, xstha2 has > delay 1s and will be stonished earlier. > During the short time before xstha2 got powered off, I saw it had time to > turn on NFS IP (I saw duplicated IP on xstha1). > And becase configuration has "order zpool_data_order inf: zpool_data ( > xstha1_san0_IP )", that means xstha2 had imported the zpool for a small time
> before being stonished, and this must never happen. > > What suggests me that resources were started on xstha2 (and duplicated IP is > an effect) are these logs portions of xstha2. > These tells me it could not turn off resources on xstha1 (correct, it > couldn't contact xstha1): > > Dec 16 15:08:56 [667] pengine: warning: custom_action: Action > xstha1_san0_IP_stop_0 on xstha1 is unrunnable (offline) > Dec 16 15:08:56 [667] pengine: warning: custom_action: Action > zpool_data_stop_0 on xstha1 is unrunnable (offline) > Dec 16 15:08:56 [667] pengine: warning: custom_action: Action > xstha2-stonith_stop_0 on xstha1 is unrunnable (offline) > Dec 16 15:08:56 [667] pengine: warning: custom_action: Action > xstha2-stonith_stop_0 on xstha1 is unrunnable (offline) > I wonder: Did you remove the hostnames from the log messages? Also are the times in sync, wondering that at the same second a resource is fallged "unrunnable" and being recovered at the same second? > These tells me xstha2 took control of resources, that were actually running > on xstha1: > > Dec 16 15:08:56 [667] pengine: notice: LogAction: * Move > xstha1_san0_IP ( xstha1 -> xstha2 ) > Dec 16 15:08:56 [667] pengine: info: LogActions: Leave > xstha2_san0_IP (Started xstha2) > Dec 16 15:08:56 [667] pengine: notice: LogAction: * Move > zpool_data ( xstha1 -> xstha2 ) > Dec 16 15:08:56 [667] pengine: info: LogActions: Leave > xstha1-stonith (Started xstha2) > Dec 16 15:08:56 [667] pengine: notice: LogAction: * Stop > xstha2-stonith ( xstha1 ) due to node availability > > The last stonith request is the last beacuse xstha2 was killed by xsrtha1 > before the 10s delay, which is what I wanted. Also note that "Stop xstha2-stonith ( xstha1 ) due to node availability" is NOT a stonith request; I have the feeling that your cluster does not use STONITH at all. Also the logs are really rather incomplete to tell details... > > Gabriele > > > Sonicle S.r.l. : http://www.sonicle.com > Music: http://www.gabrielebulfon.com > eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets > > > > > > ---------------------------------------------------------------------------- > ------ > > Da: Andrei Borzenkov <[email protected]> > A: [email protected] > Data: 17 dicembre 2020 6.38.33 CET > Oggetto: Re: [ClusterLabs] Antw: [EXT] delaying start of a resource > > > 16.12.2020 17:56, Gabriele Bulfon пишет: >> Thanks, here are the logs, there are infos about how it tried to start > resources on the nodes. > > Both logs are from the same node. > >> Keep in mind the node1 was already running the resources, and I simulated a > problem by turning down the ha interface. >> > > There is no attempt to start resources in these logs. Logs end with > stonith request. As this node had delay 10s, it probably was > successfully eliminated by another node, but there are no logs from > another node. > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
