Re: [ClusterLabs] Announcing hawk-apiserver, now in ClusterLabs

2019-02-13 Thread xin

Currently the set of REST API support these GET methods:

/api/v1/configuration/cluster_property
/api/v1/configuration/rsc_defaults
/api/v1/configuration/op_defaults
/api/v1/configuration/resources
/api/v1/configuration/resources/:id
/api/v1/configuration/primitives
/api/v1/configuration/primitives/:id
/api/v1/configuration/groups
/api/v1/configuration/groups/:id
/api/v1/configuration/groups/:id/:primitiveId
/api/v1/configuration/masters
/api/v1/configuration/masters/:id
/api/v1/configuration/clones
/api/v1/configuration/clones/:id
/api/v1/configuration/bundles
/api/v1/configuration/bundles/:id
/api/v1/configuration/bundles/:id/:primitiveId
/api/v1/configuration/nodes
/api/v1/configuration/nodes/:id
/api/v1/configuration/constraints
/api/v1/configuration/constraints/:id
/api/v1/configuration/locations
/api/v1/configuration/locations/:id
/api/v1/configuration/colocations
/api/v1/configuration/colocations/:id
/api/v1/configuration/orders
/api/v1/configuration/orders/:id
/api/v1/status/summary
/api/v1/status/nodes
/api/v1/status/nodes/:id
/api/v1/status/resources
/api/v1/status/resources/:id
/api/v1/status/failures
/api/v1/status/failures/:node

Here is record 

Regards,
xin

在 2019年02月13日 03:00, Kristoffer Grönlund 写道:

* Optional exposure of the CIB as a REST API. Right now this is somewhat
   primitive, but we are working on making this a more fully featured
   API.


___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Is fencing really a must for Postgres failover?

2019-02-13 Thread Jehan-Guillaume de Rorthais
On Wed, 13 Feb 2019 22:11:50 +0300
Andrei Borzenkov  wrote:

> 13.02.2019 15:50, Maciej S пишет:
> > Can you describe at least one situation when it could happen?
> > I see situations where data on two masters can diverge but I can't find the
> > one where data gets corrupted.   
> 
> If diverged data in two databases that are supposed to be exact copy of
> each other is not corruption, I'm afraid it will be hard to find
> convincing example ...

The one I gave earlier is a real, existing, one.

And the nightmare of a splitbrain in a database env is real as well, but I'm
sure it exists some scenarios where it doesn't matter much to drop some data to
reconsolidates both nodes faster.
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Is fencing really a must for Postgres failover?

2019-02-13 Thread Andrei Borzenkov
13.02.2019 15:50, Maciej S пишет:
> Can you describe at least one situation when it could happen?
> I see situations where data on two masters can diverge but I can't find the
> one where data gets corrupted. 

If diverged data in two databases that are supposed to be exact copy of
each other is not corruption, I'm afraid it will be hard to find
convincing example ...

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ERROR: This Target already exists in configFS

2019-02-13 Thread Bryan K. Walton
On Tue, Feb 12, 2019 at 02:14:17PM -0600, Bryan K. Walton wrote:
> I'm giving it the following commands:
> 
> pcs resource create targetRHEVM ocf:heartbeat:iSCSITarget \
>   iqn="iqn.2019-02.com.leepfrog:storage.rhevm" \
>   allowed_initiators="iqn.1994-05.com.redhat:3d066d1f423e \
>   iqn.1994-05.com.redhat:84f0f7458c58" \
>   --group ISCSIGroup
> 
> 
> But when I do that, the resource fails to start:
> Operation start for targetRHEVM (ocf:heartbeat:iSCSITarget) returned:
> 'unknown error' (1)
>  >  stderr: Feb 12 14:06:19 INFO: Parameter auto_add_default_portal is
>  >  now 'false'. 
>   >  stderr: Feb 12 14:06:20 INFO: Created target
>   >  iqn.2019-02.com.leepfrog:storage.rhevm. Created TPG 1. 
>>  stderr: Feb 12 14:06:20 ERROR: This Target already exists in
>>  configFS

I wanted to reply here that I was able to fix this by downloading the
latest iSCSITarget resource agent from here:
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/iSCSITarget.in

Apparently, there is a bug in the agent, as shipped with Centos 7.

Thanks,
Bryan
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Is fencing really a must for Postgres failover?

2019-02-13 Thread Ken Gaillot
On Wed, 2019-02-13 at 16:29 +0100, Ulrich Windl wrote:
> Hi!
> 
> I wonder: Can we close this thread with "You have been warned, so
> please don't
> come back later, crying! In the meantime you can do what you want to
> do."?
> 
> Regards,
> Ulrich

Sure, that should be on the wiki :)

But to give some more examples:

"Split brain" is the primary situation where fencing is essential. It
happens when both nodes are alive but can't see each other. This can
happen due to a network interface failing, extremely high load on one
server, a network switch glitch, etc. The consequences depend on the
service in question.

If the service uses a floating IP to provide clients with access, and
both servers bring up the IP, then packets will randomly arrive at one
or the other server. This makes connections impossible, and your
service has downtime.

For something like a database, the consequences can be more severe. If
both instances claim master, but can't coordinate, then some requests
might be handled by one server and other requests by the other. This
can result in completely out-of-sync data (each server may have
different rows with the same ID, rows may be deleted on only one
server, etc.). There is generally no automated way to resolve such
conflicts other than "throw away one side" or manually go through the
data row by row. For a busy and/or important database, these options
can be disastrous, and fencing is cheap in comparison.

The three most common fencing options are (1) an intelligent power
swtich, (2) a hardware watchdog (sbd), or (3) disk-only fencing via
iSCSI reservations. IPMI is also commonly used, but if it's physically
on-board the server it fences, it should only be used as a part of a
topology with one of the other methods. There are also fence agents
that use cloud provider APIs, for cloud-based nodes.

Hardware watchdogs are built into virtually all server-class hardware
these days, so there's not much reason to skip it.

> > > > Jehan-Guillaume de Rorthais  schrieb am
> > > > 13.02.2019 um
> 
> 15:05 in
> Nachricht <20190213150549.47634671@firost>:
> > On Wed, 13 Feb 2019 13:50:17 +0100
> > Maciej S  wrote:
> > 
> > > Can you describe at least one situation when it could happen?
> > > I see situations where data on two masters can diverge but I
> > > can't find
> 
> the
> > > one where data gets corrupted. Or maybe you think that some kind
> > > of
> > > restoration is required in case of diverged data, but this is not
> > > my use
> > > case (I can live with a loss of some data on one branch and
> > > recover it
> 
> from
> > > working master).
> > 
> > With imagination and some "if", we can describe some scenario, but
> > chaos is
> > much
> > more creative than me. But anyway, bellow is a situation:
> > 
> >   PostgreSQL doesn't do sanity check when starting as a standby and
> > catching
> > up
> >   with a primary. If your old primary crashed and catch up with the
> > new one
> >   without some housecleaning first by a human (rebuilding it or
> > using
> >   pg_rewind), it will be corrupted.
> > 
> > Please, do not leave on a public mailing list dangerous assumptions
> > like
> > "fencing is like for additional precaution". It is not, in a lot a 
> > situation,
> > PostgreSQL included.
> > 
> > I know there is use cases where extreme-HA-failure-coverage is not
> 
> required.
> > Typically, implementing 80% of the job is enough or just make sure
> > the 
> > service
> > is up, no matter the data loss. In such case, maybe you can avoid
> > the 
> > complexity
> > of a "state of the art full HA stack with seat-belt helmet and
> > parachute" 
> > and
> > have something cheaper.
> > 
> > As instance, Patroni is a very good alternative, but a PostgreSQL-
> > only 
> > solution.
> > At least, it has the elegance to use an external DCS for Quorum and
> > Watchdog
> > as
> > fencing-of-the-poor-man and self-fencing solution.
> > 
> > 
> > > śr., 13 lut 2019 o 13:10 Jehan-Guillaume de Rorthais <
> > > j...@dalibo.com>
> > > napisał(a):
> > > 
> > > > On Wed, 13 Feb 2019 13:02:30 +0100
> > > > Maciej S  wrote:
> > > >  
> > > > > Thank you all for the answers. I can see your point, but
> > > > > anyway it
> 
> seems
> > > > > that fencing is like for additional precaution.  
> > > > 
> > > > It's not.
> > > >  
> > > > > If my requirements allow some manual intervention in some
> > > > > cases (eg.
> > > > > unknown resource state after failover), then I might go ahead
> > > > > without
> > > > > fencing. At least until STONITH is not mandatory :)  
> > > > 
> > > > Well, then soon or later, we'll talk again about how to quickly
> > > > restore
> > > > your
> > > > service and/or data. And the answer will be difficult to
> > > > swallow.
> > > > 
> > > > Good luck :)
> > > >  
> > > > > pon., 11 lut 2019 o 17:54 Digimer 
> > > > > napisał(a):
> > > > >  
> > > > > > On 2019-02-11 6:34 a.m., Maciej S wrote:  
> > > > > > > I was wondering if anyone can give a plain answer if
> > > > > > > fencing is 

Re: [ClusterLabs] Antw: Re: Is fencing really a must for Postgres failover?

2019-02-13 Thread Klaus Wenninger
On 02/13/2019 04:29 PM, Ulrich Windl wrote:
> Hi!
>
> I wonder: Can we close this thread with "You have been warned, so please don't
> come back later, crying! In the meantime you can do what you want to do."?

I think something like the answer of digimer is the better and
more general advise:

If you think you don't need fencing then you probably don't need a
cluster (or you missed something ;-) ).

Klaus

>
> Regards,
> Ulrich
>
 Jehan-Guillaume de Rorthais  schrieb am 13.02.2019 um
> 15:05 in
> Nachricht <20190213150549.47634671@firost>:
>> On Wed, 13 Feb 2019 13:50:17 +0100
>> Maciej S  wrote:
>>
>>> Can you describe at least one situation when it could happen?
>>> I see situations where data on two masters can diverge but I can't find
> the
>>> one where data gets corrupted. Or maybe you think that some kind of
>>> restoration is required in case of diverged data, but this is not my use
>>> case (I can live with a loss of some data on one branch and recover it
> from
>>> working master).
>> With imagination and some "if", we can describe some scenario, but chaos is
>> much
>> more creative than me. But anyway, bellow is a situation:
>>
>>   PostgreSQL doesn't do sanity check when starting as a standby and catching
>> up
>>   with a primary. If your old primary crashed and catch up with the new one
>>   without some housecleaning first by a human (rebuilding it or using
>>   pg_rewind), it will be corrupted.
>>
>> Please, do not leave on a public mailing list dangerous assumptions like
>> "fencing is like for additional precaution". It is not, in a lot a 
>> situation,
>> PostgreSQL included.
>>
>> I know there is use cases where extreme-HA-failure-coverage is not
> required.
>> Typically, implementing 80% of the job is enough or just make sure the 
>> service
>> is up, no matter the data loss. In such case, maybe you can avoid the 
>> complexity
>> of a "state of the art full HA stack with seat-belt helmet and parachute" 
>> and
>> have something cheaper.
>>
>> As instance, Patroni is a very good alternative, but a PostgreSQL-only 
>> solution.
>> At least, it has the elegance to use an external DCS for Quorum and Watchdog
>> as
>> fencing-of-the-poor-man and self-fencing solution.
>>
>>
>>> śr., 13 lut 2019 o 13:10 Jehan-Guillaume de Rorthais 
>>> napisał(a):
>>>
 On Wed, 13 Feb 2019 13:02:30 +0100
 Maciej S  wrote:
  
> Thank you all for the answers. I can see your point, but anyway it
> seems
> that fencing is like for additional precaution.  
 It's not.
  
> If my requirements allow some manual intervention in some cases (eg.
> unknown resource state after failover), then I might go ahead without
> fencing. At least until STONITH is not mandatory :)  
 Well, then soon or later, we'll talk again about how to quickly restore
 your
 service and/or data. And the answer will be difficult to swallow.

 Good luck :)
  
> pon., 11 lut 2019 o 17:54 Digimer  napisał(a):
>  
>> On 2019-02-11 6:34 a.m., Maciej S wrote:  
>>> I was wondering if anyone can give a plain answer if fencing is  
 really  
>>> needed in case there are no shared resources being used (as far as
> I
>>> define shared resource).
>>>
>>> We want to use PAF or other Postgres (with replicated data files on
>  
 the  
>>> local drives) failover agent together with Corosync, Pacemaker and
>>> virtual IP resource and I am wondering if there is a need for
> fencing
>>> (which is very close bind to an infrastructure) if a Pacemaker is
>>> already controlling resources state. I know that in failover case 
 there  
>>> might be a need to add functionality to recover master that
> entered
>>> dirty shutdown state (eg. in case of power outage), but I can't see
>  
 any  
>>> case where fencing is really necessary. Am I wrong?
>>>
>>> I was looking for a strict answer but I couldn't find one...
>>>
>>> Regards,
>>> Maciej  
>> Fencing is as required as a wearing a seat belt in a car. You can
>> physically make things work, but the first time you're "in an  
 accident",  
>> you're screwed.
>>
>> Think of it this way;
>>
>> If services can run in two or more places at the same time without
>> coordination, you don't need a cluster, just run things everywhere.
> If
>> you need coordination though, you need fencing.
>>
>> The role of fencing is to force a node that has entered into an
> unknown
>> state and force it into a known state. In a system that requires
>> coordination, often times fencing is the only way to ensure sane  
 operation.  
>> Also, with pacemaker v2, fencing (stonith) became mandatory at a
>> programmatic level.
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.com/w/ 
>> "I am, somehow, less interested in the weight and convolutions of
>> Einstein’s brain than in the 

[ClusterLabs] Antw: Re: Is fencing really a must for Postgres failover?

2019-02-13 Thread Ulrich Windl
Hi!

I wonder: Can we close this thread with "You have been warned, so please don't
come back later, crying! In the meantime you can do what you want to do."?

Regards,
Ulrich

>>> Jehan-Guillaume de Rorthais  schrieb am 13.02.2019 um
15:05 in
Nachricht <20190213150549.47634671@firost>:
> On Wed, 13 Feb 2019 13:50:17 +0100
> Maciej S  wrote:
> 
>> Can you describe at least one situation when it could happen?
>> I see situations where data on two masters can diverge but I can't find
the
>> one where data gets corrupted. Or maybe you think that some kind of
>> restoration is required in case of diverged data, but this is not my use
>> case (I can live with a loss of some data on one branch and recover it
from
>> working master).
> 
> With imagination and some "if", we can describe some scenario, but chaos is

> much
> more creative than me. But anyway, bellow is a situation:
> 
>   PostgreSQL doesn't do sanity check when starting as a standby and catching

> up
>   with a primary. If your old primary crashed and catch up with the new one
>   without some housecleaning first by a human (rebuilding it or using
>   pg_rewind), it will be corrupted.
> 
> Please, do not leave on a public mailing list dangerous assumptions like
> "fencing is like for additional precaution". It is not, in a lot a 
> situation,
> PostgreSQL included.
> 
> I know there is use cases where extreme-HA-failure-coverage is not
required.
> Typically, implementing 80% of the job is enough or just make sure the 
> service
> is up, no matter the data loss. In such case, maybe you can avoid the 
> complexity
> of a "state of the art full HA stack with seat-belt helmet and parachute" 
> and
> have something cheaper.
> 
> As instance, Patroni is a very good alternative, but a PostgreSQL-only 
> solution.
> At least, it has the elegance to use an external DCS for Quorum and Watchdog

> as
> fencing-of-the-poor-man and self-fencing solution.
> 
> 
>> śr., 13 lut 2019 o 13:10 Jehan-Guillaume de Rorthais 
>> napisał(a):
>> 
>> > On Wed, 13 Feb 2019 13:02:30 +0100
>> > Maciej S  wrote:
>> >  
>> > > Thank you all for the answers. I can see your point, but anyway it
seems
>> > > that fencing is like for additional precaution.  
>> >
>> > It's not.
>> >  
>> > > If my requirements allow some manual intervention in some cases (eg.
>> > > unknown resource state after failover), then I might go ahead without
>> > > fencing. At least until STONITH is not mandatory :)  
>> >
>> > Well, then soon or later, we'll talk again about how to quickly restore
>> > your
>> > service and/or data. And the answer will be difficult to swallow.
>> >
>> > Good luck :)
>> >  
>> > > pon., 11 lut 2019 o 17:54 Digimer  napisał(a):
>> > >  
>> > > > On 2019-02-11 6:34 a.m., Maciej S wrote:  
>> > > > > I was wondering if anyone can give a plain answer if fencing is  
>> > really  
>> > > > > needed in case there are no shared resources being used (as far as
I
>> > > > > define shared resource).
>> > > > >
>> > > > > We want to use PAF or other Postgres (with replicated data files on
 
>> > the  
>> > > > > local drives) failover agent together with Corosync, Pacemaker and
>> > > > > virtual IP resource and I am wondering if there is a need for
fencing
>> > > > > (which is very close bind to an infrastructure) if a Pacemaker is
>> > > > > already controlling resources state. I know that in failover case 

>> > there  
>> > > > > might be a need to add functionality to recover master that
entered
>> > > > > dirty shutdown state (eg. in case of power outage), but I can't see
 
>> > any  
>> > > > > case where fencing is really necessary. Am I wrong?
>> > > > >
>> > > > > I was looking for a strict answer but I couldn't find one...
>> > > > >
>> > > > > Regards,
>> > > > > Maciej  
>> > > >
>> > > > Fencing is as required as a wearing a seat belt in a car. You can
>> > > > physically make things work, but the first time you're "in an  
>> > accident",  
>> > > > you're screwed.
>> > > >
>> > > > Think of it this way;
>> > > >
>> > > > If services can run in two or more places at the same time without
>> > > > coordination, you don't need a cluster, just run things everywhere.
If
>> > > > you need coordination though, you need fencing.
>> > > >
>> > > > The role of fencing is to force a node that has entered into an
unknown
>> > > > state and force it into a known state. In a system that requires
>> > > > coordination, often times fencing is the only way to ensure sane  
>> > operation.  
>> > > >
>> > > > Also, with pacemaker v2, fencing (stonith) became mandatory at a
>> > > > programmatic level.
>> > > >
>> > > > --
>> > > > Digimer
>> > > > Papers and Projects: https://alteeve.com/w/ 
>> > > > "I am, somehow, less interested in the weight and convolutions of
>> > > > Einstein’s brain than in the near certainty that people of equal
talent
>> > > > have lived and died in cotton fields and sweatshops." - Stephen Jay 

>> > Gould  
> ___

Re: [ClusterLabs] Is fencing really a must for Postgres failover?

2019-02-13 Thread Jehan-Guillaume de Rorthais
On Wed, 13 Feb 2019 13:50:17 +0100
Maciej S  wrote:

> Can you describe at least one situation when it could happen?
> I see situations where data on two masters can diverge but I can't find the
> one where data gets corrupted. Or maybe you think that some kind of
> restoration is required in case of diverged data, but this is not my use
> case (I can live with a loss of some data on one branch and recover it from
> working master).

With imagination and some "if", we can describe some scenario, but chaos is much
more creative than me. But anyway, bellow is a situation:

  PostgreSQL doesn't do sanity check when starting as a standby and catching up
  with a primary. If your old primary crashed and catch up with the new one
  without some housecleaning first by a human (rebuilding it or using
  pg_rewind), it will be corrupted.

Please, do not leave on a public mailing list dangerous assumptions like
"fencing is like for additional precaution". It is not, in a lot a situation,
PostgreSQL included.

I know there is use cases where extreme-HA-failure-coverage is not required.
Typically, implementing 80% of the job is enough or just make sure the service
is up, no matter the data loss. In such case, maybe you can avoid the complexity
of a "state of the art full HA stack with seat-belt helmet and parachute" and
have something cheaper.

As instance, Patroni is a very good alternative, but a PostgreSQL-only solution.
At least, it has the elegance to use an external DCS for Quorum and Watchdog as
fencing-of-the-poor-man and self-fencing solution.


> śr., 13 lut 2019 o 13:10 Jehan-Guillaume de Rorthais 
> napisał(a):
> 
> > On Wed, 13 Feb 2019 13:02:30 +0100
> > Maciej S  wrote:
> >  
> > > Thank you all for the answers. I can see your point, but anyway it seems
> > > that fencing is like for additional precaution.  
> >
> > It's not.
> >  
> > > If my requirements allow some manual intervention in some cases (eg.
> > > unknown resource state after failover), then I might go ahead without
> > > fencing. At least until STONITH is not mandatory :)  
> >
> > Well, then soon or later, we'll talk again about how to quickly restore
> > your
> > service and/or data. And the answer will be difficult to swallow.
> >
> > Good luck :)
> >  
> > > pon., 11 lut 2019 o 17:54 Digimer  napisał(a):
> > >  
> > > > On 2019-02-11 6:34 a.m., Maciej S wrote:  
> > > > > I was wondering if anyone can give a plain answer if fencing is  
> > really  
> > > > > needed in case there are no shared resources being used (as far as I
> > > > > define shared resource).
> > > > >
> > > > > We want to use PAF or other Postgres (with replicated data files on  
> > the  
> > > > > local drives) failover agent together with Corosync, Pacemaker and
> > > > > virtual IP resource and I am wondering if there is a need for fencing
> > > > > (which is very close bind to an infrastructure) if a Pacemaker is
> > > > > already controlling resources state. I know that in failover case  
> > there  
> > > > > might be a need to add functionality to recover master that entered
> > > > > dirty shutdown state (eg. in case of power outage), but I can't see  
> > any  
> > > > > case where fencing is really necessary. Am I wrong?
> > > > >
> > > > > I was looking for a strict answer but I couldn't find one...
> > > > >
> > > > > Regards,
> > > > > Maciej  
> > > >
> > > > Fencing is as required as a wearing a seat belt in a car. You can
> > > > physically make things work, but the first time you're "in an  
> > accident",  
> > > > you're screwed.
> > > >
> > > > Think of it this way;
> > > >
> > > > If services can run in two or more places at the same time without
> > > > coordination, you don't need a cluster, just run things everywhere. If
> > > > you need coordination though, you need fencing.
> > > >
> > > > The role of fencing is to force a node that has entered into an unknown
> > > > state and force it into a known state. In a system that requires
> > > > coordination, often times fencing is the only way to ensure sane  
> > operation.  
> > > >
> > > > Also, with pacemaker v2, fencing (stonith) became mandatory at a
> > > > programmatic level.
> > > >
> > > > --
> > > > Digimer
> > > > Papers and Projects: https://alteeve.com/w/
> > > > "I am, somehow, less interested in the weight and convolutions of
> > > > Einstein’s brain than in the near certainty that people of equal talent
> > > > have lived and died in cotton fields and sweatshops." - Stephen Jay  
> > Gould  
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Is fencing really a must for Postgres failover?

2019-02-13 Thread Maciej S
Can you describe at least one situation when it could happen?
I see situations where data on two masters can diverge but I can't find the
one where data gets corrupted. Or maybe you think that some kind of
restoration is required in case of diverged data, but this is not my use
case (I can live with a loss of some data on one branch and recover it from
working master).

Thanks,
Maciej

śr., 13 lut 2019 o 13:10 Jehan-Guillaume de Rorthais 
napisał(a):

> On Wed, 13 Feb 2019 13:02:30 +0100
> Maciej S  wrote:
>
> > Thank you all for the answers. I can see your point, but anyway it seems
> > that fencing is like for additional precaution.
>
> It's not.
>
> > If my requirements allow some manual intervention in some cases (eg.
> > unknown resource state after failover), then I might go ahead without
> > fencing. At least until STONITH is not mandatory :)
>
> Well, then soon or later, we'll talk again about how to quickly restore
> your
> service and/or data. And the answer will be difficult to swallow.
>
> Good luck :)
>
> > pon., 11 lut 2019 o 17:54 Digimer  napisał(a):
> >
> > > On 2019-02-11 6:34 a.m., Maciej S wrote:
> > > > I was wondering if anyone can give a plain answer if fencing is
> really
> > > > needed in case there are no shared resources being used (as far as I
> > > > define shared resource).
> > > >
> > > > We want to use PAF or other Postgres (with replicated data files on
> the
> > > > local drives) failover agent together with Corosync, Pacemaker and
> > > > virtual IP resource and I am wondering if there is a need for fencing
> > > > (which is very close bind to an infrastructure) if a Pacemaker is
> > > > already controlling resources state. I know that in failover case
> there
> > > > might be a need to add functionality to recover master that entered
> > > > dirty shutdown state (eg. in case of power outage), but I can't see
> any
> > > > case where fencing is really necessary. Am I wrong?
> > > >
> > > > I was looking for a strict answer but I couldn't find one...
> > > >
> > > > Regards,
> > > > Maciej
> > >
> > > Fencing is as required as a wearing a seat belt in a car. You can
> > > physically make things work, but the first time you're "in an
> accident",
> > > you're screwed.
> > >
> > > Think of it this way;
> > >
> > > If services can run in two or more places at the same time without
> > > coordination, you don't need a cluster, just run things everywhere. If
> > > you need coordination though, you need fencing.
> > >
> > > The role of fencing is to force a node that has entered into an unknown
> > > state and force it into a known state. In a system that requires
> > > coordination, often times fencing is the only way to ensure sane
> operation.
> > >
> > > Also, with pacemaker v2, fencing (stonith) became mandatory at a
> > > programmatic level.
> > >
> > > --
> > > Digimer
> > > Papers and Projects: https://alteeve.com/w/
> > > "I am, somehow, less interested in the weight and convolutions of
> > > Einstein’s brain than in the near certainty that people of equal talent
> > > have lived and died in cotton fields and sweatshops." - Stephen Jay
> Gould
> > >
>
>
>
> --
> Jehan-Guillaume de Rorthais
> Dalibo
>
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Is fencing really a must for Postgres failover?

2019-02-13 Thread Jehan-Guillaume de Rorthais
On Wed, 13 Feb 2019 13:02:30 +0100
Maciej S  wrote:

> Thank you all for the answers. I can see your point, but anyway it seems
> that fencing is like for additional precaution.

It's not.

> If my requirements allow some manual intervention in some cases (eg.
> unknown resource state after failover), then I might go ahead without
> fencing. At least until STONITH is not mandatory :)

Well, then soon or later, we'll talk again about how to quickly restore your
service and/or data. And the answer will be difficult to swallow.

Good luck :)

> pon., 11 lut 2019 o 17:54 Digimer  napisał(a):
> 
> > On 2019-02-11 6:34 a.m., Maciej S wrote:  
> > > I was wondering if anyone can give a plain answer if fencing is really
> > > needed in case there are no shared resources being used (as far as I
> > > define shared resource).
> > >
> > > We want to use PAF or other Postgres (with replicated data files on the
> > > local drives) failover agent together with Corosync, Pacemaker and
> > > virtual IP resource and I am wondering if there is a need for fencing
> > > (which is very close bind to an infrastructure) if a Pacemaker is
> > > already controlling resources state. I know that in failover case there
> > > might be a need to add functionality to recover master that entered
> > > dirty shutdown state (eg. in case of power outage), but I can't see any
> > > case where fencing is really necessary. Am I wrong?
> > >
> > > I was looking for a strict answer but I couldn't find one...
> > >
> > > Regards,
> > > Maciej  
> >
> > Fencing is as required as a wearing a seat belt in a car. You can
> > physically make things work, but the first time you're "in an accident",
> > you're screwed.
> >
> > Think of it this way;
> >
> > If services can run in two or more places at the same time without
> > coordination, you don't need a cluster, just run things everywhere. If
> > you need coordination though, you need fencing.
> >
> > The role of fencing is to force a node that has entered into an unknown
> > state and force it into a known state. In a system that requires
> > coordination, often times fencing is the only way to ensure sane operation.
> >
> > Also, with pacemaker v2, fencing (stonith) became mandatory at a
> > programmatic level.
> >
> > --
> > Digimer
> > Papers and Projects: https://alteeve.com/w/
> > "I am, somehow, less interested in the weight and convolutions of
> > Einstein’s brain than in the near certainty that people of equal talent
> > have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
> >  



-- 
Jehan-Guillaume de Rorthais
Dalibo
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Is fencing really a must for Postgres failover?

2019-02-13 Thread Maciej S
Thank you all for the answers. I can see your point, but anyway it seems
that fencing is like for additional precaution.
If my requirements allow some manual intervention in some cases (eg.
unknown resource state after failover), then I might go ahead without
fencing. At least until STONITH is not mandatory :)

Thanks,
Maciej


pon., 11 lut 2019 o 17:54 Digimer  napisał(a):

> On 2019-02-11 6:34 a.m., Maciej S wrote:
> > I was wondering if anyone can give a plain answer if fencing is really
> > needed in case there are no shared resources being used (as far as I
> > define shared resource).
> >
> > We want to use PAF or other Postgres (with replicated data files on the
> > local drives) failover agent together with Corosync, Pacemaker and
> > virtual IP resource and I am wondering if there is a need for fencing
> > (which is very close bind to an infrastructure) if a Pacemaker is
> > already controlling resources state. I know that in failover case there
> > might be a need to add functionality to recover master that entered
> > dirty shutdown state (eg. in case of power outage), but I can't see any
> > case where fencing is really necessary. Am I wrong?
> >
> > I was looking for a strict answer but I couldn't find one...
> >
> > Regards,
> > Maciej
>
> Fencing is as required as a wearing a seat belt in a car. You can
> physically make things work, but the first time you're "in an accident",
> you're screwed.
>
> Think of it this way;
>
> If services can run in two or more places at the same time without
> coordination, you don't need a cluster, just run things everywhere. If
> you need coordination though, you need fencing.
>
> The role of fencing is to force a node that has entered into an unknown
> state and force it into a known state. In a system that requires
> coordination, often times fencing is the only way to ensure sane operation.
>
> Also, with pacemaker v2, fencing (stonith) became mandatory at a
> programmatic level.
>
> --
> Digimer
> Papers and Projects: https://alteeve.com/w/
> "I am, somehow, less interested in the weight and convolutions of
> Einstein’s brain than in the near certainty that people of equal talent
> have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
>
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Announcing hawk-apiserver, now in ClusterLabs

2019-02-13 Thread Adam Spiers

Ulrich Windl  wrote:

Hello!

I'd like to comment as an "old" SuSE customer:
I'm amazed that lighttpd is dropped in favor of some new go application:
SuSE now has a base system that needs (correct me if I'm wrong): shell, perl,
python, java, go, ruby, ...?


Sorry for the off-topic nitpick, but my OCD requires me to point this
out ;-)  The name switched from SuSE to SUSE (all caps) around 15
years ago:

http://www.internetnews.com/dev-news/article.php/3085261
https://en.wikipedia.org/wiki/SUSE_Linux

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_azure_arm flooding syslog

2019-02-13 Thread Oyvind Albrigtsen

That's weird. I initially tested it in US East 2.

I'd check that the packages/versions for python-azure-sdk,
python-msrest, python-msrestazure, python-keyring, etc are the same
on your nodes in Europe vs US locations.

If they are the same you could create a support ticket on Azure to
ask if there's been any changes to the API in the US, or if they can
see what's causing it to fail in the US location and not in Europe.

On 12/02/19 17:37 +0100, Thomas Berreis wrote:

Network fencing is working correctly therefore credentials should be fine.
Maybe there are differences between Azure Cloud in Europe and US because I
get these errors only with our clusters in US Central and US East 2 but not
in Europe West and North. Any hint?

Thomas

-Original Message-
From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Oyvind
Albrigtsen
Sent: Dienstag, 12. Februar 2019 09:51
To: Cluster Labs - All topics related to open-source clustering welcomed

Subject: Re: [ClusterLabs] fence_azure_arm flooding syslog

At least that limits what the issue could be.

I'd check if the credentials allows getting and setting
network-settings:
https://github.com/ClusterLabs/fence-agents/blob/master/agents/azure_arm/fen
ce_azure_arm.py#L204

On 11/02/19 16:46 +0100, Thomas Berreis wrote:

Hi Oyvind,

Debug mode doesn't help me. The error occurs after successfully getting
Bearer token. No other issues are shown:

DEBUG:requests_oauthlib.oauth2_session:Invoking 0 token response hooks.
2019-02-11 15:27:12,687 DEBUG: Invoking 0 token response hooks.
DEBUG:requests_oauthlib.oauth2_session:Obtained token {u'resource':
u'https://management.core.windows.net/', u'access_token': ***',
u'ext_expires_in': u'3599', u'expires_in': u'3599', u'expires_at':
1549902431.687557, u'token_type': u'Bearer', u'not_before':
u'1549898532',
u'expires_on': u'1549902432'}.
2019-02-11 15:27:12,687 DEBUG: Obtained token {u'resource':
u'https://management.core.windows.net/', u'access_token': u'***',
u'ext_expires_in': u'3599', u'expires_in': u'3599', u'expires_at':
1549902431.687557, u'token_type': u'Bearer', u'not_before':
u'1549898532',
u'expires_on': u'1549902432'}.
WARNING:msrestazure.azure_active_directory:Keyring cache token has failed:
No recommended backend was available. Install the keyrings.alt package
if you want to use the non-recommended backends. See README.rst for

details.

2019-02-11 15:27:12,687 WARNING: Keyring cache token has failed: No
recommended backend was available. Install the keyrings.alt package if
you want to use the non-recommended backends. See README.rst for details.
DEBUG:requests_oauthlib.oauth2_session:Encoding `client_id` "***" with
`client_secret` as Basic auth credentials.
2019-02-11 15:27:12,727 DEBUG: Encoding `client_id` "***" with
`client_secret` as Basic auth credentials.
DEBUG:requests_oauthlib.oauth2_session:Requesting url
https://login.microsoftonline.com/***/oauth2/token using method POST.
2019-02-11 15:27:12,727 DEBUG: Requesting url
https://login.microsoftonline.com/***/oauth2/token using method POST.

2019-02-11 15:27:13,072 DEBUG: Obtained token {u'resource':
u'https://management.core.windows.net/', u'access_token': u'e***w',
u'ext_expires_in': u'3600', u'expires_in': u'3600', u'expires_at':
1549902433.072768, u'token_type': u'Bearer', u'not_before':
u'1549898533',
u'expires_on': u'1549902433'}.
WARNING:msrestazure.azure_active_directory:Keyring cache token has failed:
No recommended backend was available. Install the keyrings.alt package
if you want to use the non-recommended backends. See README.rst for

details.

2019-02-11 15:27:13,073 WARNING: Keyring cache token has failed: No
recommended backend was available. Install the keyrings.alt package if
you want to use the non-recommended backends. See README.rst for details.
INFO:root:getting power status for VM node01

Thomas

-Original Message-
From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Oyvind
Albrigtsen
Sent: Montag, 11. Februar 2019 14:58
To: Cluster Labs - All topics related to open-source clustering
welcomed 
Subject: Re: [ClusterLabs] fence_azure_arm flooding syslog

You can try adding verbose=1 and see if that gives you some more
details on where it's failing.

On 11/02/19 14:47 +0100, Thomas Berreis wrote:

We need this feature because shutdown / reboot takes too much time ( >
5
min) and network fencing fences the virtual machine much faster ( < 5

sec).

We finished all the required steps and network fencing works as
expected but I'm still confused about these errors in the log and the
failure counts showed by pcs ...

fence_azure_arm: Keyring cache token has failed: No recommended
backend was available. Install the keyrings.alt package if you want to
use the non-recommended backends. See README.rst for details.

stonith-ng[7789]: warning: fence_azure_arm[7896] stderr: [ 2019-02-11
13:41:19,178 WARNING: Keyring cache token has failed: No recommended
backend was available. Install the keyrings.alt package if you 

Re: [ClusterLabs] Antw: Announcing hawk-apiserver, now in ClusterLabs

2019-02-13 Thread Kristoffer Grönlund
Ulrich Windl   writes:

> Hello!
>
> I'd like to comment as an "old" SuSE customer:
> I'm amazed that lighttpd is dropped in favor of some new go application:
> SuSE now has a base system that needs (correct me if I'm wrong): shell, perl,
> python, java, go, ruby, ...?
>

Oh, that list is a lot longer, and this is not the first go project to
make it into SLE.

> Maybe each programmer has his favorite. Personally I also learned quite a lot
> of languages (and even editors), but most being equivalent, you'll have to
> decide whether it makes sense to start using still another language (go in 
> this
> case). Especially i'm afraid of single-vendor languages...

TBH I am more sceptical about languages designed by committee ;)

Cheers,
Kristoffer

>
> Regards,
> Ulrich
>
 Kristoffer Grönlund  schrieb am 12.02.2019 um 20:00
> in
> Nachricht <87mun0g7c9@suse.com>:
>> Hello everyone,
>> 
>> I just wanted to send out an email about the hawk-apiserver project
>> which was moved into the ClusterLabs organization on Github today. This
>> project is used by us at SUSE for Hawk in our latest releases already,
>> and is also available in openSUSE for use with Hawk. However, I am
>> hoping that it can prove to be useful more generally, not just for Hawk
>> but for other projects that may want to integrate with Pacemaker using
>> the C API, and also to show what is possible when using the API.
>> 
>> To describe the hawk-apiserver briefly, I'll start by describing the use
>> case it was designed to cover: Previously, we were using lighttpd as the
>> web server for Hawk (a Ruby on Rails application), but a while ago the
>> maintainers of lighttpd decided that since Hawk was the only user of
>> this project in SLE, they would like to remove it from the next
>> release. This left Apache as the web server available to us, which has
>> some interesting issues for Hawk: Mainly, we expect people to run apache
>> as a resource in the cluster which might result in a confusing mix of
>> processes on the systems.
>> 
>> At the same time, I had started looking at Go and discovered how easy it
>> was to write a basic proxying web server in Go. So, as an experiment I
>> decided to see if I could replace the use of lighttpd with a custom web
>> server written in Go. Turns out the answer was yes! Once we had our own
>> web server, I discovered new things we could do with it. So here are
>> some of the other unique features in hawk-apiserver now:
>> 
>> * SSL certificate termination, and automatic detection and redirection
>>   from HTTP to HTTPS *on the same port*: Hawk runs on port 7630, and if
>>   someone accesses that port via HTTP, they will get a redirect to the
>>   same port but on HTTPS. It's magic.
>> 
>> * Persistent connection to Pacemaker via the C API, enabling instant
>>   change notification to the web frontend. From the point of view of the
>>   web frontend, this is a long-lived connection which completes when
>>   something changes in the CIB. On the backend side, it uses goroutines
>>   to enable thousands of such long-lived connections with minimal
>>   overhead.
>> 
>> * Optional exposure of the CIB as a REST API. Right now this is somewhat
>>   primitive, but we are working on making this a more fully featured
>>   API.
>> 
>> * Configurable static file serving routes (serve images on /img from
>>   /srv/http/images for example).
>> 
>> * Configurable proxying of subroutes to other web applications.
>> 
>> The URL to the project is https://github.com/ClusterLabs/hawk-apiserver,
>> I hope you will find it useful. Comments, issues and contributions are
>> of course more than welcome.
>> 
>> One final note: hawk-apiserver uses a project called go-pacemaker
>> located at https://github.com/krig/go-pacemaker. I indend to transfer
>> this to ClusterLabs as well. go-pacemaker is still somewhat rough around
>> the edges, and our plan is to work on the C API of pacemaker to make
>> using and exposing it via Go easier, as well as moving functionality
>> from crm_mon into the C API so that status information can be made
>> available in a more convenient format via the API as well.
>> 
>> -- 
>> // Kristoffer Grönlund
>> // kgronl...@suse.com 
>> ___
>> Users mailing list: Users@clusterlabs.org 
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com
___
Users mailing list: Users@clusterlabs.org