Re: [ClusterLabs] Fwd: Postgres pacemaker cluster failure

2019-07-10 Thread Jehan-Guillaume de Rorthais
On Wed, 10 Jul 2019 17:25:57 +0200 Danka Ivanovic wrote: ... > I know it should be avoided starting master database with systemctl, but I > didn't find a way to start it with pacemaker. I will test again, but I am > out of ideas. Put the cluster in debug mode and provide the full logs +

Re: [ClusterLabs] [EXTERNAL] Re: "node is unclean" leads to gratuitous reboot

2019-07-10 Thread Michael Powell
Thanks to you and Andrei for your responses. In our particular situation, we want to be able to operate with either node in stand-alone mode, or with both nodes protected by HA. I did not mention this, but I am working on upgrading our product from a version which used Pacemaker version

Re: [ClusterLabs] PAF fails to promote slave: Can not get current node LSN location

2019-07-10 Thread Jehan-Guillaume de Rorthais
On Wed, 10 Jul 2019 17:08:45 +0200 Tiemen Ruiten wrote: > On Wed, Jul 10, 2019 at 2:47 PM Jehan-Guillaume de Rorthais > wrote: > > > > > > > I double-checked monitoring data: there was approximately one minute of > > > replication lag on one slave and two minutes of replication lag on the > >

Re: [ClusterLabs] PAF fails to promote slave: Can not get current node LSN location

2019-07-10 Thread Tiemen Ruiten
On Wed, Jul 10, 2019 at 2:47 PM Jehan-Guillaume de Rorthais wrote: > > > > I double-checked monitoring data: there was approximately one minute of > > replication lag on one slave and two minutes of replication lag on the > > other slave when the original issue occurred. > > what lag? current

Re: [ClusterLabs] Fwd: Postgres pacemaker cluster failure

2019-07-10 Thread Jehan-Guillaume de Rorthais
On Wed, 10 Jul 2019 16:34:17 +0200 Danka Ivanovic wrote: > Hi, Thank you all for responding so quickly. Part of corosync.log file is > attached. Cluster failure occured in 09:16 AM yesterday. > Debug mode is turned on in corosync configuration, but I didn't turn it on > in pacemaker config. I

Re: [ClusterLabs] failed resource resurection - failcount/cleanup etc ?

2019-07-10 Thread Ken Gaillot
On Wed, 2019-07-10 at 11:26 +0100, lejeczek wrote: > hi guys, possibly @devel if they pop in here. > > is there, will there be, a way to make cluster deal with failed > resources in such a way that cluster would try not to give up on > failed > resources? > > I understand that as of now the only

Re: [ClusterLabs] colocation - but do not stop resources on failure

2019-07-10 Thread Ken Gaillot
On Wed, 2019-07-10 at 10:30 +0100, lejeczek wrote: > On 09/07/2019 20:26, Ken Gaillot wrote: > > On Tue, 2019-07-09 at 11:21 +0100, lejeczek wrote: > > > hi guys, > > > > > > how to, if possible, create colocation which would not stop > > > dependent > > > resources if the target(that would be

Re: [ClusterLabs] PAF fails to promote slave: Can not get current node LSN location

2019-07-10 Thread Jehan-Guillaume de Rorthais
On Tue, 9 Jul 2019 22:22:47 +0200 Tiemen Ruiten wrote: > On Tue, Jul 9, 2019 at 4:21 PM Jehan-Guillaume de Rorthais > wrote: > > > On Tue, 9 Jul 2019 13:22:06 +0200 > > Tiemen Ruiten wrote: > > > > > On Mon, Jul 8, 2019 at 10:01 PM Jehan-Guillaume de Rorthais < > > j...@dalibo.com> > >

Re: [ClusterLabs] Antw: Re: Fwd: Postgres pacemaker cluster failure

2019-07-10 Thread Jehan-Guillaume de Rorthais
On Wed, 10 Jul 2019 14:36:10 +0200 "Ulrich Windl" wrote: > >>> Jehan-Guillaume de Rorthais schrieb am 10.07.2019 um > 13:14 in > Nachricht <20190710131427.3876ea36@firost>: > > On Wed, 10 Jul 2019 12:53:59 +0300 > > Andrei Borzenkov wrote: ... > >> Some generic mean to set it for > >>

[ClusterLabs] Antw: Re: Fwd: Postgres pacemaker cluster failure

2019-07-10 Thread Ulrich Windl
>>> Jehan-Guillaume de Rorthais schrieb am 10.07.2019 um 13:14 in Nachricht <20190710131427.3876ea36@firost>: > On Wed, 10 Jul 2019 12:53:59 +0300 > Andrei Borzenkov wrote: > >> On Wed, Jul 10, 2019 at 12:42 PM Jehan‑Guillaume de Rorthais >> wrote: >> >> > >> > > > Jul 09 09:16:32 [2679]

Re: [ClusterLabs] Fwd: Postgres pacemaker cluster failure

2019-07-10 Thread Jehan-Guillaume de Rorthais
On Wed, 10 Jul 2019 12:53:59 +0300 Andrei Borzenkov wrote: > On Wed, Jul 10, 2019 at 12:42 PM Jehan-Guillaume de Rorthais > wrote: > > > > > > > Jul 09 09:16:32 [2679] postgres1 lrmd:debug: > > > > child_kill_helper: Kill pid 12735's group Jul 09 09:16:34 [2679] > > > > postgres1

[ClusterLabs] failed resource resurection - failcount/cleanup etc ?

2019-07-10 Thread lejeczek
hi guys, possibly @devel if they pop in here. is there, will there be, a way to make cluster deal with failed resources in such a way that cluster would try not to give up on failed resources? I understand that as of now the only way is  user's manual intervention (under which I'd include any

Re: [ClusterLabs] Fwd: Postgres pacemaker cluster failure

2019-07-10 Thread Andrei Borzenkov
On Wed, Jul 10, 2019 at 12:42 PM Jehan-Guillaume de Rorthais wrote: > > > > Jul 09 09:16:32 [2679] postgres1 lrmd:debug: > > > child_kill_helper: Kill pid 12735's group Jul 09 09:16:34 [2679] > > > postgres1 lrmd: warning: child_timeout_callback: > > > PGSQL_monitor_15000

Re: [ClusterLabs] Fwd: Postgres pacemaker cluster failure

2019-07-10 Thread Andrei Borzenkov
On Wed, Jul 10, 2019 at 12:42 PM Jehan-Guillaume de Rorthais wrote: > > > P.S. crm_resource is called by resource agent (pgsqlms). And it shows > > result of original resource probing which makes it confusing. At least > > it explains where these logs entries come from. > > Not sure tu understand

Re: [ClusterLabs] Fwd: Postgres pacemaker cluster failure

2019-07-10 Thread Jehan-Guillaume de Rorthais
On Tue, 9 Jul 2019 19:57:06 +0300 Andrei Borzenkov wrote: > 09.07.2019 13:08, Danka Ivanović пишет: > > Hi I didn't manage to start master with postgres, even if I increased start > > timeout. I checked executable paths and start options. We would require much more logs from this failure... >

Re: [ClusterLabs] colocation - but do not stop resources on failure

2019-07-10 Thread lejeczek
On 09/07/2019 20:26, Ken Gaillot wrote: > On Tue, 2019-07-09 at 11:21 +0100, lejeczek wrote: >> hi guys, >> >> how to, if possible, create colocation which would not stop dependent >> resources if the target(that would be systemd agent) resource fails >> on >> all nodes? >> >> many thanks, L. >