Re: [Freeipa-devel] Topology: Central node removal in star topology

2015-06-25 Thread Simo Sorce
On Thu, 2015-06-25 at 09:53 +0200, Petr Vobornik wrote:
 On 06/25/2015 08:52 AM, Ludwig Krispenz wrote:
 
  On 06/24/2015 09:01 PM, Simo Sorce wrote:
  On Wed, 2015-06-24 at 11:25 +0200, Ludwig Krispenz wrote:
  Oleg,
 
  the topology plugin relies on existing connection between servers which
  remain in a topolgy. If you remove a central node in your topology you
  are asking for trouble.
  With Petr's patch it warns you that your topology will be disconnected,
  and if you insist we cannot guarantee anything.
  should we completely prohibit this ?
  No, but a --force should be needed.
  Without a --force option we should not allow to remove a replica
  completely from another one.
 
  I don't know, I think you could
  also enforce an uninstall of vm175 with probably the same result.
  what you mean be calculating the remaining topology and send it to the
  remaining servers does not work, it would require to send a removal of a
  segment, which would be rejected.
  You would have to connect to each replica that has a replication
  agreement with vm175 and remove the segment from that replica. But it
  wouldn't really help much as once a replica is isolated from the central
  one, it will not see the other operations going on in other replicas.
 
  Once we have a topology resolver we will be able to warn that removing a
  specific replica will cause a split brain and make very loud warnings
  we have this already, see the output of Oleg's example:
 
  ipa-replica-manage del vm-175.idm.lab.eng.brq.redhat.com
  Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be
  disconnected:
  Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers:
  vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
  Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers:
  vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com,
  vm-127.idm.lab.eng.brq.redhat.com
  Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers:
  vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com,
  vm-036.idm.lab.eng.brq.redhat.com
  Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers:
  vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
  Continue to delete? [no]: yes
 
  it tells you that the topology gets disconnected and which connections
  will be missing, the continue yes/no is the --force,
  the question was, should we allow a force in this situation ?
 
 
 What it does is:
 1. Checks current topology, prints errors with introduction msg:
 Current topology is disconnected: + errors
 2. Checks topology after node removal, prints errors with msg:
 Topology after removal of %s will be disconnected: + errors
 3. if there were errors in #1 or #2, it does:
 if not force and not ipautil.user_input(Continue to delete?, False):
sys.exit(Aborted)
 
 
 To make it more loud we can introduce msg in #2 with: WARNING:  or 
 something even more louder
 
 The question Continue to delete? could be
 * removed, and therefore --force will be always required for such case
 * be still regarded as 'force' but the question could be changed e.g. 
 to: Continue to delete and disconnect the topology?

I do not like questions very much, they are usually annoying to
scripting and such. I would not ask questions, and simply deny the
operation if --force is not present, and allow it if it is present.

  More interesting would be if we can heal this later by adding new
  segments.
  Indeed, reconnecting all the severed replicas should cause all the
  removals (segments or servers) to be replicated among servers and should
  bring back the topology view in a consistent state. But not until all
  servers are reconnected and replication has started again.
  This healing can also be required without forcing removal by an admin.
  If you have a start topology and your central node goes down and is not
  recoverable

Yes, I think the most likely case (bar testing) for ever using --force
remove is that a server imploded and died, and just need replacing.
Being able to recover from such a situation by simply reconnecting
replicas until the split brain is healed is paramount.

I would go as far as saying that perhaps we should provide a simple
heal-topology command in a *future* version that will pick one replica
and reconnect all the missing branches in a stellar topology.

The only problem in doing that is that the tool my have a misleading
idea of the status of the topology given that when replication is
severed not all topology changes may be reflected to all servers. So
different servers may have a different view of the current topology
based on when they got disconnected and the replication flow was
interrupted. So a good tool would have to reconnect all branches it
sees, then wait a little to see if the reconnected replicas send in
topology changes and re-iterate if further changes caused the topology
to still be in split brain.

Another tool could 

Re: [Freeipa-devel] Topology: Central node removal in star topology

2015-06-25 Thread Petr Spacek
On 25.6.2015 09:53, Petr Vobornik wrote:
 On 06/25/2015 08:52 AM, Ludwig Krispenz wrote:

 On 06/24/2015 09:01 PM, Simo Sorce wrote:
 On Wed, 2015-06-24 at 11:25 +0200, Ludwig Krispenz wrote:
 Oleg,

 the topology plugin relies on existing connection between servers which
 remain in a topolgy. If you remove a central node in your topology you
 are asking for trouble.
 With Petr's patch it warns you that your topology will be disconnected,
 and if you insist we cannot guarantee anything.
 should we completely prohibit this ?
 No, but a --force should be needed.
 Without a --force option we should not allow to remove a replica
 completely from another one.

 I don't know, I think you could
 also enforce an uninstall of vm175 with probably the same result.
 what you mean be calculating the remaining topology and send it to the
 remaining servers does not work, it would require to send a removal of a
 segment, which would be rejected.
 You would have to connect to each replica that has a replication
 agreement with vm175 and remove the segment from that replica. But it
 wouldn't really help much as once a replica is isolated from the central
 one, it will not see the other operations going on in other replicas.

 Once we have a topology resolver we will be able to warn that removing a
 specific replica will cause a split brain and make very loud warnings
 we have this already, see the output of Oleg's example:

 ipa-replica-manage del vm-175.idm.lab.eng.brq.redhat.com
 Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be
 disconnected:
 Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers:
 vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
 Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers:
 vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com,
 vm-127.idm.lab.eng.brq.redhat.com
 Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers:
 vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com,
 vm-036.idm.lab.eng.brq.redhat.com
 Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers:
 vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
 Continue to delete? [no]: yes

 it tells you that the topology gets disconnected and which connections
 will be missing, the continue yes/no is the --force,
 the question was, should we allow a force in this situation ?

 
 What it does is:
 1. Checks current topology, prints errors with introduction msg:
Current topology is disconnected: + errors
 2. Checks topology after node removal, prints errors with msg:
Topology after removal of %s will be disconnected: + errors
 3. if there were errors in #1 or #2, it does:
if not force and not ipautil.user_input(Continue to delete?, False):
   sys.exit(Aborted)
 
 
 To make it more loud we can introduce msg in #2 with: WARNING:  or something
 even more louder
 
 The question Continue to delete? could be
 * removed, and therefore --force will be always required for such case
 * be still regarded as 'force' but the question could be changed e.g. to:
 Continue to delete and disconnect the topology?

Nitpick:
I'm not a native English speaker but Current topology is disconnected does
not sound clear and scary enough to me.

At very least, the line should start with WARNING: to follow the same patter
as all other warnings.

Also it would be nice to add something descriptive like 'Changes in will not
be replicated to all servers and data WILL become inconsistent.'

Or possibly 'GATE TO HELL IS WIDE OPEN'? :-)

Of course all this needs to be rephrased to proper English ...

Petr^2 Spacek


 More interesting would be if we can heal this later by adding new
 segments.
 Indeed, reconnecting all the severed replicas should cause all the
 removals (segments or servers) to be replicated among servers and should
 bring back the topology view in a consistent state. But not until all
 servers are reconnected and replication has started again.
 This healing can also be required without forcing removal by an admin.
 If you have a start topology and your central node goes down and is not
 recoverable

 Simo.


 Ludwig
 On 06/24/2015 11:04 AM, Oleg Fayans wrote:
 Hi everybody,

 Current implementation of topology plugin (including patch 878 from
 Petr) allows the deletion of the central node in the star topology.
 I had the following topology:

 snip

-- 
Manage your subscription for the Freeipa-devel mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-devel
Contribute to FreeIPA: http://www.freeipa.org/page/Contribute/Code


Re: [Freeipa-devel] Topology: Central node removal in star topology

2015-06-25 Thread Ludwig Krispenz


On 06/24/2015 09:01 PM, Simo Sorce wrote:

On Wed, 2015-06-24 at 11:25 +0200, Ludwig Krispenz wrote:

Oleg,

the topology plugin relies on existing connection between servers which
remain in a topolgy. If you remove a central node in your topology you
are asking for trouble.
With Petr's patch it warns you that your topology will be disconnected,
and if you insist we cannot guarantee anything.
should we completely prohibit this ?

No, but a --force should be needed.
Without a --force option we should not allow to remove a replica
completely from another one.


I don't know, I think you could
also enforce an uninstall of vm175 with probably the same result.
what you mean be calculating the remaining topology and send it to the
remaining servers does not work, it would require to send a removal of a
segment, which would be rejected.

You would have to connect to each replica that has a replication
agreement with vm175 and remove the segment from that replica. But it
wouldn't really help much as once a replica is isolated from the central
one, it will not see the other operations going on in other replicas.

Once we have a topology resolver we will be able to warn that removing a
specific replica will cause a split brain and make very loud warnings

we have this already, see the output of Oleg's example:

ipa-replica-manage del vm-175.idm.lab.eng.brq.redhat.com
Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be 
disconnected:
Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com, 
vm-127.idm.lab.eng.brq.redhat.com
Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com, 
vm-036.idm.lab.eng.brq.redhat.com
Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com

Continue to delete? [no]: yes

it tells you that the topology gets disconnected and which connections 
will be missing, the continue yes/no is the --force,

the question was, should we allow a force in this situation ?


More interesting would be if we can heal this later by adding new segments.

Indeed, reconnecting all the severed replicas should cause all the
removals (segments or servers) to be replicated among servers and should
bring back the topology view in a consistent state. But not until all
servers are reconnected and replication has started again.
This healing can also be required without forcing removal by an admin. 
If you have a start topology and your central node goes down and is not 
recoverable


Simo.



Ludwig
On 06/24/2015 11:04 AM, Oleg Fayans wrote:

Hi everybody,

Current implementation of topology plugin (including patch 878 from
Petr) allows the deletion of the central node in the star topology.
I had the following topology:

vm056  vm036
  \ / |
  vm175 |
  / \ |
vm127   vm244

I was able to remove node vm175 from node vm244:

[17:54:48]ofayans@vm-244:~]$ ipa-replica-manage del
vm-175.idm.lab.eng.brq.redhat.com
Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be
disconnected:
Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers:
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers:
vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com,
vm-127.idm.lab.eng.brq.redhat.com
Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers:
vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com,
vm-036.idm.lab.eng.brq.redhat.com
Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers:
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
Continue to delete? [no]: yes
Waiting for removal of replication agreements
unexpected error: limits exceeded for this query

I would expect this operation to delete 4 replication agreements on
all nodes:
vm056 - vm175
vm127 - vm175
vm244 - vm175
vm036 - vm175

However an arbitrary set of replication agreements was deleted on each
node leading to total infrastructure inconsistency:
===
vm056**thought the topology was as follows:
vm056  vm036
/ |
  vm175 |
  / \ |
vm127   vm244
[10:28:55]ofayans@vm-056:~]$ ipa topologysegment-find realm
--
4 segments matched
--
   Segment name: 036-to-244
   Left node: vm-036.idm.lab.eng.brq.redhat.com
   Right node: vm-244.idm.lab.eng.brq.redhat.com
   Connectivity: both

   Segment name:
vm-036.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com
   Left node: vm-036.idm.lab.eng.brq.redhat.com
   Right node: 

Re: [Freeipa-devel] Topology: Central node removal in star topology

2015-06-25 Thread Petr Vobornik

On 06/25/2015 08:52 AM, Ludwig Krispenz wrote:


On 06/24/2015 09:01 PM, Simo Sorce wrote:

On Wed, 2015-06-24 at 11:25 +0200, Ludwig Krispenz wrote:

Oleg,

the topology plugin relies on existing connection between servers which
remain in a topolgy. If you remove a central node in your topology you
are asking for trouble.
With Petr's patch it warns you that your topology will be disconnected,
and if you insist we cannot guarantee anything.
should we completely prohibit this ?

No, but a --force should be needed.
Without a --force option we should not allow to remove a replica
completely from another one.


I don't know, I think you could
also enforce an uninstall of vm175 with probably the same result.
what you mean be calculating the remaining topology and send it to the
remaining servers does not work, it would require to send a removal of a
segment, which would be rejected.

You would have to connect to each replica that has a replication
agreement with vm175 and remove the segment from that replica. But it
wouldn't really help much as once a replica is isolated from the central
one, it will not see the other operations going on in other replicas.

Once we have a topology resolver we will be able to warn that removing a
specific replica will cause a split brain and make very loud warnings

we have this already, see the output of Oleg's example:

ipa-replica-manage del vm-175.idm.lab.eng.brq.redhat.com
Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be
disconnected:
Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers:
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers:
vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com,
vm-127.idm.lab.eng.brq.redhat.com
Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers:
vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com,
vm-036.idm.lab.eng.brq.redhat.com
Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers:
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
Continue to delete? [no]: yes

it tells you that the topology gets disconnected and which connections
will be missing, the continue yes/no is the --force,
the question was, should we allow a force in this situation ?



What it does is:
1. Checks current topology, prints errors with introduction msg:
   Current topology is disconnected: + errors
2. Checks topology after node removal, prints errors with msg:
   Topology after removal of %s will be disconnected: + errors
3. if there were errors in #1 or #2, it does:
   if not force and not ipautil.user_input(Continue to delete?, False):
  sys.exit(Aborted)


To make it more loud we can introduce msg in #2 with: WARNING:  or 
something even more louder


The question Continue to delete? could be
* removed, and therefore --force will be always required for such case
* be still regarded as 'force' but the question could be changed e.g. 
to: Continue to delete and disconnect the topology?




More interesting would be if we can heal this later by adding new
segments.

Indeed, reconnecting all the severed replicas should cause all the
removals (segments or servers) to be replicated among servers and should
bring back the topology view in a consistent state. But not until all
servers are reconnected and replication has started again.

This healing can also be required without forcing removal by an admin.
If you have a start topology and your central node goes down and is not
recoverable


Simo.



Ludwig
On 06/24/2015 11:04 AM, Oleg Fayans wrote:

Hi everybody,

Current implementation of topology plugin (including patch 878 from
Petr) allows the deletion of the central node in the star topology.
I had the following topology:


snip
--
Petr Vobornik

--
Manage your subscription for the Freeipa-devel mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-devel
Contribute to FreeIPA: http://www.freeipa.org/page/Contribute/Code


Re: [Freeipa-devel] Topology: Central node removal in star topology

2015-06-24 Thread Ludwig Krispenz

Oleg,

the topology plugin relies on existing connection between servers which 
remain in a topolgy. If you remove a central node in your topology you 
are asking for trouble.
With Petr's patch it warns you that your topology will be disconnected, 
and if you insist we cannot guarantee anything.
should we completely prohibit this ? I don't know, I think you could 
also enforce an uninstall of vm175 with probably the same result.
what you mean be calculating the remaining topology and send it to the 
remaining servers does not work, it would require to send a removal of a 
segment, which would be rejected.


The topology is broken, and I don't know how much we should invest in 
making this info consistent on all servers.


More interesting would be if we can heal this later by adding new segments.

Ludwig
On 06/24/2015 11:04 AM, Oleg Fayans wrote:

Hi everybody,

Current implementation of topology plugin (including patch 878 from 
Petr) allows the deletion of the central node in the star topology.

I had the following topology:

vm056  vm036
 \ / |
 vm175 |
 / \ |
vm127   vm244

I was able to remove node vm175 from node vm244:

[17:54:48]ofayans@vm-244:~]$ ipa-replica-manage del 
vm-175.idm.lab.eng.brq.redhat.com
Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be 
disconnected:
Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com, 
vm-127.idm.lab.eng.brq.redhat.com
Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com, 
vm-036.idm.lab.eng.brq.redhat.com
Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com

Continue to delete? [no]: yes
Waiting for removal of replication agreements
unexpected error: limits exceeded for this query

I would expect this operation to delete 4 replication agreements on 
all nodes:

vm056 - vm175
vm127 - vm175
vm244 - vm175
vm036 - vm175

However an arbitrary set of replication agreements was deleted on each 
node leading to total infrastructure inconsistency:

===
vm056**thought the topology was as follows:
vm056  vm036
   / |
 vm175 |
 / \ |
vm127   vm244
[10:28:55]ofayans@vm-056:~]$ ipa topologysegment-find realm
--
4 segments matched
--
  Segment name: 036-to-244
  Left node: vm-036.idm.lab.eng.brq.redhat.com
  Right node: vm-244.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-036.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com

  Left node: vm-036.idm.lab.eng.brq.redhat.com
  Right node: vm-175.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-127.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com

  Left node: vm-127.idm.lab.eng.brq.redhat.com
  Right node: vm-175.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-175.idm.lab.eng.brq.redhat.com-to-vm-244.idm.lab.eng.brq.redhat.com

  Left node: vm-175.idm.lab.eng.brq.redhat.com
  Right node: vm-244.idm.lab.eng.brq.redhat.com
  Connectivity: both

Number of entries returned 4

===
both vm036**vm244 thought the topology was as follows:
vm056  vm036
 \   |
 vm175 |
 /   |
vm127   vm244

[10:26:23]ofayans@vm-036:~]$ ipa topologysegment-find
Suffix name: realm
--
3 segments matched
--
  Segment name: 036-to-244
  Left node: vm-036.idm.lab.eng.brq.redhat.com
  Right node: vm-244.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-056.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com

  Left node: vm-056.idm.lab.eng.brq.redhat.com
  Right node: vm-175.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-127.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com

  Left node: vm-127.idm.lab.eng.brq.redhat.com
  Right node: vm-175.idm.lab.eng.brq.redhat.com
  Connectivity: both

Number of entries returned 3


===
**vm127 thought the topology was as follows:
vm056  vm036
 \/  |
 vm175 |
  \  |
vm127   vm244

[10:31:08]ofayans@vm-127:~]$ ipa topologysegment-find realm
--
4 segments matched
--
  Segment name: 036-to-244
  Left node: vm-036.idm.lab.eng.brq.redhat.com
  Right node: 

Re: [Freeipa-devel] Topology: Central node removal in star topology

2015-06-24 Thread Oleg Fayans



On 06/24/2015 11:25 AM, Ludwig Krispenz wrote:

Oleg,

the topology plugin relies on existing connection between servers 
which remain in a topolgy. If you remove a central node in your 
topology you are asking for trouble.
With Petr's patch it warns you that your topology will be 
disconnected, and if you insist we cannot guarantee anything.
Agree. I just wanted to try edge cases to see how one can break the 
system :)
should we completely prohibit this ? I don't know, I think you could 
also enforce an uninstall of vm175 with probably the same result.
what you mean be calculating the remaining topology and send it to the 
remaining servers does not work, it would require to send a removal of 
a segment, which would be rejected.


The topology is broken, and I don't know how much we should invest in 
making this info consistent on all servers.


More interesting would be if we can heal this later by adding new 
segments.
Yes, here comes the biggest question raised from this case: obviously, 
when none of the nodes possess the correct topology information 
(including the one which deleted the central node), there is no way to 
fix it by adding segments connecting the nodes that became disconnected. 
I still think that the recalculation of the resulting tree should be 
done at least on the node that performs the removal action. And when 
later some other node gets connected, it should understand somehow that 
it's topology information is outdated


Ludwig
On 06/24/2015 11:04 AM, Oleg Fayans wrote:

Hi everybody,

Current implementation of topology plugin (including patch 878 from 
Petr) allows the deletion of the central node in the star topology.

I had the following topology:

vm056  vm036
 \ / |
 vm175 |
 / \ |
vm127   vm244

I was able to remove node vm175 from node vm244:

[17:54:48]ofayans@vm-244:~]$ ipa-replica-manage del 
vm-175.idm.lab.eng.brq.redhat.com
Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be 
disconnected:
Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com, 
vm-127.idm.lab.eng.brq.redhat.com
Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com, 
vm-036.idm.lab.eng.brq.redhat.com
Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com

Continue to delete? [no]: yes
Waiting for removal of replication agreements
unexpected error: limits exceeded for this query

I would expect this operation to delete 4 replication agreements on 
all nodes:

vm056 - vm175
vm127 - vm175
vm244 - vm175
vm036 - vm175

However an arbitrary set of replication agreements was deleted on 
each node leading to total infrastructure inconsistency:

===
vm056**thought the topology was as follows:
vm056  vm036
   / |
 vm175 |
 / \ |
vm127   vm244
[10:28:55]ofayans@vm-056:~]$ ipa topologysegment-find realm
--
4 segments matched
--
  Segment name: 036-to-244
  Left node: vm-036.idm.lab.eng.brq.redhat.com
  Right node: vm-244.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-036.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com

  Left node: vm-036.idm.lab.eng.brq.redhat.com
  Right node: vm-175.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-127.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com

  Left node: vm-127.idm.lab.eng.brq.redhat.com
  Right node: vm-175.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-175.idm.lab.eng.brq.redhat.com-to-vm-244.idm.lab.eng.brq.redhat.com

  Left node: vm-175.idm.lab.eng.brq.redhat.com
  Right node: vm-244.idm.lab.eng.brq.redhat.com
  Connectivity: both

Number of entries returned 4

===
both vm036**vm244 thought the topology was as follows:
vm056  vm036
 \   |
 vm175 |
 /   |
vm127   vm244

[10:26:23]ofayans@vm-036:~]$ ipa topologysegment-find
Suffix name: realm
--
3 segments matched
--
  Segment name: 036-to-244
  Left node: vm-036.idm.lab.eng.brq.redhat.com
  Right node: vm-244.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-056.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com

  Left node: vm-056.idm.lab.eng.brq.redhat.com
  Right node: vm-175.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 

[Freeipa-devel] Topology: Central node removal in star topology

2015-06-24 Thread Oleg Fayans

Hi everybody,

Current implementation of topology plugin (including patch 878 from 
Petr) allows the deletion of the central node in the star topology.

I had the following topology:

vm056  vm036
 \ / |
 vm175 |
 / \ |
vm127   vm244

I was able to remove node vm175 from node vm244:

[17:54:48]ofayans@vm-244:~]$ ipa-replica-manage del 
vm-175.idm.lab.eng.brq.redhat.com
Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be 
disconnected:
Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com, 
vm-127.idm.lab.eng.brq.redhat.com
Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com, 
vm-036.idm.lab.eng.brq.redhat.com
Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com

Continue to delete? [no]: yes
Waiting for removal of replication agreements
unexpected error: limits exceeded for this query

I would expect this operation to delete 4 replication agreements on all 
nodes:

vm056 - vm175
vm127 - vm175
vm244 - vm175
vm036 - vm175

However an arbitrary set of replication agreements was deleted on each 
node leading to total infrastructure inconsistency:

===
vm056**thought the topology was as follows:
vm056  vm036
   / |
 vm175 |
 / \ |
vm127   vm244
[10:28:55]ofayans@vm-056:~]$ ipa topologysegment-find realm
--
4 segments matched
--
  Segment name: 036-to-244
  Left node: vm-036.idm.lab.eng.brq.redhat.com
  Right node: vm-244.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-036.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com

  Left node: vm-036.idm.lab.eng.brq.redhat.com
  Right node: vm-175.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-127.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com

  Left node: vm-127.idm.lab.eng.brq.redhat.com
  Right node: vm-175.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-175.idm.lab.eng.brq.redhat.com-to-vm-244.idm.lab.eng.brq.redhat.com

  Left node: vm-175.idm.lab.eng.brq.redhat.com
  Right node: vm-244.idm.lab.eng.brq.redhat.com
  Connectivity: both

Number of entries returned 4

===
both vm036**vm244 thought the topology was as follows:
vm056  vm036
 \   |
 vm175 |
 /   |
vm127   vm244

[10:26:23]ofayans@vm-036:~]$ ipa topologysegment-find
Suffix name: realm
--
3 segments matched
--
  Segment name: 036-to-244
  Left node: vm-036.idm.lab.eng.brq.redhat.com
  Right node: vm-244.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-056.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com

  Left node: vm-056.idm.lab.eng.brq.redhat.com
  Right node: vm-175.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-127.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com

  Left node: vm-127.idm.lab.eng.brq.redhat.com
  Right node: vm-175.idm.lab.eng.brq.redhat.com
  Connectivity: both

Number of entries returned 3


===
**vm127 thought the topology was as follows:
vm056  vm036
 \/  |
 vm175 |
  \  |
vm127   vm244

[10:31:08]ofayans@vm-127:~]$ ipa topologysegment-find realm
--
4 segments matched
--
  Segment name: 036-to-244
  Left node: vm-036.idm.lab.eng.brq.redhat.com
  Right node: vm-244.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-036.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com

  Left node: vm-036.idm.lab.eng.brq.redhat.com
  Right node: vm-175.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-056.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com

  Left node: vm-056.idm.lab.eng.brq.redhat.com
  Right node: vm-175.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-175.idm.lab.eng.brq.redhat.com-to-vm-244.idm.lab.eng.brq.redhat.com

  Left node: vm-175.idm.lab.eng.brq.redhat.com
  Right node: vm-244.idm.lab.eng.brq.redhat.com
  Connectivity: both

Number of entries returned 4


If I, for example, add a segment connecting vm127 and vm244, these two 
nodes will not synchronize the topology 

Re: [Freeipa-devel] Topology: Central node removal in star topology

2015-06-24 Thread Oleg Fayans



On 06/24/2015 11:47 AM, Ludwig Krispenz wrote:


On 06/24/2015 11:36 AM, Oleg Fayans wrote:



On 06/24/2015 11:25 AM, Ludwig Krispenz wrote:

Oleg,

the topology plugin relies on existing connection between servers 
which remain in a topolgy. If you remove a central node in your 
topology you are asking for trouble.
With Petr's patch it warns you that your topology will be 
disconnected, and if you insist we cannot guarantee anything.
Agree. I just wanted to try edge cases to see how one can break the 
system :)
should we completely prohibit this ? I don't know, I think you could 
also enforce an uninstall of vm175 with probably the same result.
what you mean be calculating the remaining topology and send it to 
the remaining servers does not work, it would require to send a 
removal of a segment, which would be rejected.


The topology is broken, and I don't know how much we should invest 
in making this info consistent on all servers.


More interesting would be if we can heal this later by adding new 
segments.
Yes, here comes the biggest question raised from this case: 
obviously, when none of the nodes possess the correct topology 
information (including the one which deleted the central node), there 
is no way to fix it by adding segments connecting the nodes that 
became disconnected. 
It shoul not need the full information, but it has to be able to reach 
one of the nodes to be connected. when the topology is broken, you 
loose to feature to be ably to apply a change on any node, eg in your 
case if you want to connect vm036 and vm056 an have removed vm175, you 
have to do it on vm056, vm036 or vm244. This should work, if not we 
have to fix it - unless we completely prevent disconnecting a topology
Well, this is exactly the problem here: all replicas should contain 
precise copies of all the info: accounts, hosts, sudorules, etc, 
including topology information. However, if in this case I manually 
connect disconnected node at vm127 (or vm056, does not matter) it 
results in topology information inconsistency across the infrastructure:

This would be the topology from the point of view of vm127:

vm056  vm036
 \/  |
 vm175 |
  \  |
vm127   vm244

And this - from the point of view of vm244 and vm036

vm056  vm036
 \   |
 vm175 |
 |
vm127   -  vm244
I still think that the recalculation of the resulting tree should be 
done at least on the node that performs the removal action. And when 
later some other node gets connected, it should understand somehow 
that it's topology information is outdated


Ludwig
On 06/24/2015 11:04 AM, Oleg Fayans wrote:

Hi everybody,

Current implementation of topology plugin (including patch 878 from 
Petr) allows the deletion of the central node in the star topology.

I had the following topology:

vm056  vm036
 \ / |
 vm175 |
 / \ |
vm127   vm244

I was able to remove node vm175 from node vm244:

[17:54:48]ofayans@vm-244:~]$ ipa-replica-manage del 
vm-175.idm.lab.eng.brq.redhat.com
Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be 
disconnected:
Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-244.idm.lab.eng.brq.redhat.com, 
vm-036.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-244.idm.lab.eng.brq.redhat.com, 
vm-056.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com
Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com

Continue to delete? [no]: yes
Waiting for removal of replication agreements
unexpected error: limits exceeded for this query

I would expect this operation to delete 4 replication agreements on 
all nodes:

vm056 - vm175
vm127 - vm175
vm244 - vm175
vm036 - vm175

However an arbitrary set of replication agreements was deleted on 
each node leading to total infrastructure inconsistency:

===
vm056**thought the topology was as follows:
vm056  vm036
   / |
 vm175 |
 / \ |
vm127   vm244
[10:28:55]ofayans@vm-056:~]$ ipa topologysegment-find realm
--
4 segments matched
--
  Segment name: 036-to-244
  Left node: vm-036.idm.lab.eng.brq.redhat.com
  Right node: vm-244.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-036.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com

  Left node: vm-036.idm.lab.eng.brq.redhat.com
  Right node: vm-175.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 

Re: [Freeipa-devel] Topology: Central node removal in star topology

2015-06-24 Thread Ludwig Krispenz


On 06/24/2015 11:36 AM, Oleg Fayans wrote:



On 06/24/2015 11:25 AM, Ludwig Krispenz wrote:

Oleg,

the topology plugin relies on existing connection between servers 
which remain in a topolgy. If you remove a central node in your 
topology you are asking for trouble.
With Petr's patch it warns you that your topology will be 
disconnected, and if you insist we cannot guarantee anything.
Agree. I just wanted to try edge cases to see how one can break the 
system :)
should we completely prohibit this ? I don't know, I think you could 
also enforce an uninstall of vm175 with probably the same result.
what you mean be calculating the remaining topology and send it to 
the remaining servers does not work, it would require to send a 
removal of a segment, which would be rejected.


The topology is broken, and I don't know how much we should invest in 
making this info consistent on all servers.


More interesting would be if we can heal this later by adding new 
segments.
Yes, here comes the biggest question raised from this case: obviously, 
when none of the nodes possess the correct topology information 
(including the one which deleted the central node), there is no way to 
fix it by adding segments connecting the nodes that became disconnected. 
It shoul not need the full information, but it has to be able to reach 
one of the nodes to be connected. when the topology is broken, you loose 
to feature to be ably to apply a change on any node, eg in your case if 
you want to connect vm036 and vm056 an have removed vm175, you have to 
do it on vm056, vm036 or vm244. This should work, if not we have to fix 
it - unless we completely prevent disconnecting a topology
I still think that the recalculation of the resulting tree should be 
done at least on the node that performs the removal action. And when 
later some other node gets connected, it should understand somehow 
that it's topology information is outdated


Ludwig
On 06/24/2015 11:04 AM, Oleg Fayans wrote:

Hi everybody,

Current implementation of topology plugin (including patch 878 from 
Petr) allows the deletion of the central node in the star topology.

I had the following topology:

vm056  vm036
 \ / |
 vm175 |
 / \ |
vm127   vm244

I was able to remove node vm175 from node vm244:

[17:54:48]ofayans@vm-244:~]$ ipa-replica-manage del 
vm-175.idm.lab.eng.brq.redhat.com
Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be 
disconnected:
Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-244.idm.lab.eng.brq.redhat.com, 
vm-036.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-244.idm.lab.eng.brq.redhat.com, 
vm-056.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com
Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com

Continue to delete? [no]: yes
Waiting for removal of replication agreements
unexpected error: limits exceeded for this query

I would expect this operation to delete 4 replication agreements on 
all nodes:

vm056 - vm175
vm127 - vm175
vm244 - vm175
vm036 - vm175

However an arbitrary set of replication agreements was deleted on 
each node leading to total infrastructure inconsistency:

===
vm056**thought the topology was as follows:
vm056  vm036
   / |
 vm175 |
 / \ |
vm127   vm244
[10:28:55]ofayans@vm-056:~]$ ipa topologysegment-find realm
--
4 segments matched
--
  Segment name: 036-to-244
  Left node: vm-036.idm.lab.eng.brq.redhat.com
  Right node: vm-244.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-036.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com

  Left node: vm-036.idm.lab.eng.brq.redhat.com
  Right node: vm-175.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-127.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com

  Left node: vm-127.idm.lab.eng.brq.redhat.com
  Right node: vm-175.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-175.idm.lab.eng.brq.redhat.com-to-vm-244.idm.lab.eng.brq.redhat.com

  Left node: vm-175.idm.lab.eng.brq.redhat.com
  Right node: vm-244.idm.lab.eng.brq.redhat.com
  Connectivity: both

Number of entries returned 4

===
both vm036**vm244 thought the topology was as follows:
vm056  vm036
 \   |
 vm175 |
 /   |
vm127   vm244

[10:26:23]ofayans@vm-036:~]$ ipa topologysegment-find
Suffix name: realm

Re: [Freeipa-devel] Topology: Central node removal in star topology

2015-06-24 Thread Ludwig Krispenz


On 06/24/2015 12:02 PM, Oleg Fayans wrote:



On 06/24/2015 11:47 AM, Ludwig Krispenz wrote:


On 06/24/2015 11:36 AM, Oleg Fayans wrote:



On 06/24/2015 11:25 AM, Ludwig Krispenz wrote:

Oleg,

the topology plugin relies on existing connection between servers 
which remain in a topolgy. If you remove a central node in your 
topology you are asking for trouble.
With Petr's patch it warns you that your topology will be 
disconnected, and if you insist we cannot guarantee anything.
Agree. I just wanted to try edge cases to see how one can break the 
system :)
should we completely prohibit this ? I don't know, I think you 
could also enforce an uninstall of vm175 with probably the same result.
what you mean be calculating the remaining topology and send it to 
the remaining servers does not work, it would require to send a 
removal of a segment, which would be rejected.


The topology is broken, and I don't know how much we should invest 
in making this info consistent on all servers.


More interesting would be if we can heal this later by adding new 
segments.
Yes, here comes the biggest question raised from this case: 
obviously, when none of the nodes possess the correct topology 
information (including the one which deleted the central node), 
there is no way to fix it by adding segments connecting the nodes 
that became disconnected. 
It shoul not need the full information, but it has to be able to 
reach one of the nodes to be connected. when the topology is broken, 
you loose to feature to be ably to apply a change on any node, eg in 
your case if you want to connect vm036 and vm056 an have removed 
vm175, you have to do it on vm056, vm036 or vm244. This should work, 
if not we have to fix it - unless we completely prevent disconnecting 
a topology
Well, this is exactly the problem here: all replicas should contain 
precise copies of all the info: accounts, hosts, sudorules, etc, 
including topology information. However, if in this case I manually 
connect disconnected node at vm127 (or vm056, does not matter) it 
results in topology information inconsistency across the infrastructure:

This would be the topology from the point of view of vm127:
did you add teh connection on vm127 or on vm244 ? sorry, but in these 
situations to understand what's going on, it can matter.
to me it looks like you did it on vm127, so its there, it got replicated 
to vm244, but replicationback does not work and so the deletion of teh 
segs to vm175, which should still be in the changelogs of 036 and 244, 
don#t get to 127. Do you have something in the error logs of 244 ?




vm056  vm036
 \/  |
 vm175 |
  \  |
vm127   vm244

And this - from the point of view of vm244 and vm036

vm056  vm036
 \   |
 vm175 |
 |
vm127   -  vm244
I still think that the recalculation of the resulting tree should be 
done at least on the node that performs the removal action. And when 
later some other node gets connected, it should understand somehow 
that it's topology information is outdated


Ludwig
On 06/24/2015 11:04 AM, Oleg Fayans wrote:

Hi everybody,

Current implementation of topology plugin (including patch 878 
from Petr) allows the deletion of the central node in the star 
topology.

I had the following topology:

vm056  vm036
 \ / |
 vm175 |
 / \ |
vm127   vm244

I was able to remove node vm175 from node vm244:

[17:54:48]ofayans@vm-244:~]$ ipa-replica-manage del 
vm-175.idm.lab.eng.brq.redhat.com
Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will 
be disconnected:
Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-244.idm.lab.eng.brq.redhat.com, 
vm-036.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-244.idm.lab.eng.brq.redhat.com, 
vm-056.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com
Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com

Continue to delete? [no]: yes
Waiting for removal of replication agreements
unexpected error: limits exceeded for this query

I would expect this operation to delete 4 replication agreements 
on all nodes:

vm056 - vm175
vm127 - vm175
vm244 - vm175
vm036 - vm175

However an arbitrary set of replication agreements was deleted on 
each node leading to total infrastructure inconsistency:

===
vm056**thought the topology was as follows:
vm056  vm036
   / |
 vm175 |
 / \ |
vm127   vm244
[10:28:55]ofayans@vm-056:~]$ ipa topologysegment-find realm

Re: [Freeipa-devel] Topology: Central node removal in star topology

2015-06-24 Thread Oleg Fayans



On 06/24/2015 12:02 PM, Oleg Fayans wrote:



On 06/24/2015 11:47 AM, Ludwig Krispenz wrote:


On 06/24/2015 11:36 AM, Oleg Fayans wrote:



On 06/24/2015 11:25 AM, Ludwig Krispenz wrote:

Oleg,

the topology plugin relies on existing connection between servers 
which remain in a topolgy. If you remove a central node in your 
topology you are asking for trouble.
With Petr's patch it warns you that your topology will be 
disconnected, and if you insist we cannot guarantee anything.
Agree. I just wanted to try edge cases to see how one can break the 
system :)
should we completely prohibit this ? I don't know, I think you 
could also enforce an uninstall of vm175 with probably the same result.
what you mean be calculating the remaining topology and send it to 
the remaining servers does not work, it would require to send a 
removal of a segment, which would be rejected.


The topology is broken, and I don't know how much we should invest 
in making this info consistent on all servers.


More interesting would be if we can heal this later by adding new 
segments.
Yes, here comes the biggest question raised from this case: 
obviously, when none of the nodes possess the correct topology 
information (including the one which deleted the central node), 
there is no way to fix it by adding segments connecting the nodes 
that became disconnected. 
It shoul not need the full information, but it has to be able to 
reach one of the nodes to be connected. when the topology is broken, 
you loose to feature to be ably to apply a change on any node, eg in 
your case if you want to connect vm036 and vm056 an have removed 
vm175, you have to do it on vm056, vm036 or vm244. This should work, 
if not we have to fix it - unless we completely prevent disconnecting 
a topology
Well, this is exactly the problem here: all replicas should contain 
precise copies of all the info: accounts, hosts, sudorules, etc, 
including topology information. However, if in this case I manually 
connect disconnected node at vm127 (or vm056, does not matter) it 
results in topology information inconsistency across the infrastructure:

This would be the topology from the point of view of vm127:

vm056  vm036
 \/  |
 vm175 |
  \  |
vm127   vm244


sorry, I meant
vm056  vm036
 \/  |
 vm175 |
  \  |
vm127 - vm244



And this - from the point of view of vm244 and vm036

vm056  vm036
 \   |
 vm175 |
 |
vm127   -  vm244
I still think that the recalculation of the resulting tree should be 
done at least on the node that performs the removal action. And when 
later some other node gets connected, it should understand somehow 
that it's topology information is outdated


Ludwig
On 06/24/2015 11:04 AM, Oleg Fayans wrote:

Hi everybody,

Current implementation of topology plugin (including patch 878 
from Petr) allows the deletion of the central node in the star 
topology.

I had the following topology:

vm056  vm036
 \ / |
 vm175 |
 / \ |
vm127   vm244

I was able to remove node vm175 from node vm244:

[17:54:48]ofayans@vm-244:~]$ ipa-replica-manage del 
vm-175.idm.lab.eng.brq.redhat.com
Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will 
be disconnected:
Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-244.idm.lab.eng.brq.redhat.com, 
vm-036.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-244.idm.lab.eng.brq.redhat.com, 
vm-056.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com
Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com

Continue to delete? [no]: yes
Waiting for removal of replication agreements
unexpected error: limits exceeded for this query

I would expect this operation to delete 4 replication agreements 
on all nodes:

vm056 - vm175
vm127 - vm175
vm244 - vm175
vm036 - vm175

However an arbitrary set of replication agreements was deleted on 
each node leading to total infrastructure inconsistency:

===
vm056**thought the topology was as follows:
vm056  vm036
   / |
 vm175 |
 / \ |
vm127   vm244
[10:28:55]ofayans@vm-056:~]$ ipa topologysegment-find realm
--
4 segments matched
--
  Segment name: 036-to-244
  Left node: vm-036.idm.lab.eng.brq.redhat.com
  Right node: vm-244.idm.lab.eng.brq.redhat.com
  Connectivity: both

  Segment name: 
vm-036.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com

  Left 

Re: [Freeipa-devel] Topology: Central node removal in star topology

2015-06-24 Thread Petr Spacek
On 24.6.2015 13:09, Ludwig Krispenz wrote:
 
 On 06/24/2015 12:50 PM, Oleg Fayans wrote:


 On 06/24/2015 12:28 PM, Ludwig Krispenz wrote:

 On 06/24/2015 12:02 PM, Oleg Fayans wrote:


 On 06/24/2015 11:47 AM, Ludwig Krispenz wrote:

 On 06/24/2015 11:36 AM, Oleg Fayans wrote:


 On 06/24/2015 11:25 AM, Ludwig Krispenz wrote:
 Oleg,

 the topology plugin relies on existing connection between servers which
 remain in a topolgy. If you remove a central node in your topology you
 are asking for trouble.
 With Petr's patch it warns you that your topology will be disconnected,
 and if you insist we cannot guarantee anything.
 Agree. I just wanted to try edge cases to see how one can break the
 system :)
 should we completely prohibit this ? I don't know, I think you could
 also enforce an uninstall of vm175 with probably the same result.
 what you mean be calculating the remaining topology and send it to the
 remaining servers does not work, it would require to send a removal of
 a segment, which would be rejected.

 The topology is broken, and I don't know how much we should invest in
 making this info consistent on all servers.

 More interesting would be if we can heal this later by adding new
 segments.
 Yes, here comes the biggest question raised from this case: obviously,
 when none of the nodes possess the correct topology information
 (including the one which deleted the central node), there is no way to
 fix it by adding segments connecting the nodes that became disconnected. 
 It shoul not need the full information, but it has to be able to reach
 one of the nodes to be connected. when the topology is broken, you loose
 to feature to be ably to apply a change on any node, eg in your case if
 you want to connect vm036 and vm056 an have removed vm175, you have to do
 it on vm056, vm036 or vm244. This should work, if not we have to fix it -
 unless we completely prevent disconnecting a topology
 Well, this is exactly the problem here: all replicas should contain
 precise copies of all the info: accounts, hosts, sudorules, etc, including
 topology information. However, if in this case I manually connect
 disconnected node at vm127 (or vm056, does not matter) it results in
 topology information inconsistency across the infrastructure:
 This would be the topology from the point of view of vm127:
 did you add teh connection on vm127 or on vm244 ? sorry, but in these
 situations to understand what's going on, it can matter.
 to me it looks like you did it on vm127, so its there, it got replicated to
 vm244, but replicationback does not work and so the deletion of teh segs to
 vm175, which should still be in the changelogs of 036 and 244, don#t get to
 127. Do you have something in the error logs of 244 ?
 Yes, I added the connection on vm127. vm244 does not have anything in the
 ldap errors log corresponding to the replication with vm127. In fact, I
 tried to create a user on vm244 to see if it will be replicated to vm127,
 and the user creation failed with the following error message:
 Operations error: Allocation of a new value for range cn=posix
 ids,cn=distributed numeric assignment plugin,cn=plugins,cn=config failed!
 Unable to proceed.

 Is it because the master node was deleted?
 think so, yes.
 There are probably more things to check before removing a server :-(

This particular error is caused by the way how we distribute DNA ranges among
servers. The range is assigned only on first use (not during replica
installation) so when the original master is gone you have no way how to
obtain the range (if you did not need it before).

This is tracked as
https://bugzilla.redhat.com/show_bug.cgi?id=1211366

Please comment here so we do not forget how annoying it is :-)

Petr^2 Spacek

 The corresponding message in the error log is
 [24/Jun/2015:12:44:18 +0200] dna-plugin - dna_pre_op: no more values
 available!!

-- 
Manage your subscription for the Freeipa-devel mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-devel
Contribute to FreeIPA: http://www.freeipa.org/page/Contribute/Code


Re: [Freeipa-devel] Topology: Central node removal in star topology

2015-06-24 Thread Ludwig Krispenz


On 06/24/2015 12:50 PM, Oleg Fayans wrote:



On 06/24/2015 12:28 PM, Ludwig Krispenz wrote:


On 06/24/2015 12:02 PM, Oleg Fayans wrote:



On 06/24/2015 11:47 AM, Ludwig Krispenz wrote:


On 06/24/2015 11:36 AM, Oleg Fayans wrote:



On 06/24/2015 11:25 AM, Ludwig Krispenz wrote:

Oleg,

the topology plugin relies on existing connection between servers 
which remain in a topolgy. If you remove a central node in your 
topology you are asking for trouble.
With Petr's patch it warns you that your topology will be 
disconnected, and if you insist we cannot guarantee anything.
Agree. I just wanted to try edge cases to see how one can break 
the system :)
should we completely prohibit this ? I don't know, I think you 
could also enforce an uninstall of vm175 with probably the same 
result.
what you mean be calculating the remaining topology and send it 
to the remaining servers does not work, it would require to send 
a removal of a segment, which would be rejected.


The topology is broken, and I don't know how much we should 
invest in making this info consistent on all servers.


More interesting would be if we can heal this later by adding new 
segments.
Yes, here comes the biggest question raised from this case: 
obviously, when none of the nodes possess the correct topology 
information (including the one which deleted the central node), 
there is no way to fix it by adding segments connecting the nodes 
that became disconnected. 
It shoul not need the full information, but it has to be able to 
reach one of the nodes to be connected. when the topology is 
broken, you loose to feature to be ably to apply a change on any 
node, eg in your case if you want to connect vm036 and vm056 an 
have removed vm175, you have to do it on vm056, vm036 or vm244. 
This should work, if not we have to fix it - unless we completely 
prevent disconnecting a topology
Well, this is exactly the problem here: all replicas should contain 
precise copies of all the info: accounts, hosts, sudorules, etc, 
including topology information. However, if in this case I manually 
connect disconnected node at vm127 (or vm056, does not matter) it 
results in topology information inconsistency across the infrastructure:

This would be the topology from the point of view of vm127:
did you add teh connection on vm127 or on vm244 ? sorry, but in these 
situations to understand what's going on, it can matter.
to me it looks like you did it on vm127, so its there, it got 
replicated to vm244, but replicationback does not work and so the 
deletion of teh segs to vm175, which should still be in the 
changelogs of 036 and 244, don#t get to 127. Do you have something in 
the error logs of 244 ?
Yes, I added the connection on vm127. vm244 does not have anything in 
the ldap errors log corresponding to the replication with vm127. In 
fact, I tried to create a user on vm244 to see if it will be 
replicated to vm127, and the user creation failed with the following 
error message:
Operations error: Allocation of a new value for range cn=posix 
ids,cn=distributed numeric assignment plugin,cn=plugins,cn=config 
failed! Unable to proceed.


Is it because the master node was deleted?

think so, yes.
There are probably more things to check before removing a server :-(


The corresponding message in the error log is
[24/Jun/2015:12:44:18 +0200] dna-plugin - dna_pre_op: no more values 
available!!




vm056  vm036
 \/  |
 vm175 |
  \  |
vm127   vm244

And this - from the point of view of vm244 and vm036

vm056  vm036
 \   |
 vm175 |
 |
vm127   -  vm244
I still think that the recalculation of the resulting tree should 
be done at least on the node that performs the removal action. And 
when later some other node gets connected, it should understand 
somehow that it's topology information is outdated


Ludwig
On 06/24/2015 11:04 AM, Oleg Fayans wrote:

Hi everybody,

Current implementation of topology plugin (including patch 878 
from Petr) allows the deletion of the central node in the star 
topology.

I had the following topology:

vm056  vm036
 \ / |
 vm175 |
 / \ |
vm127   vm244

I was able to remove node vm175 from node vm244:

[17:54:48]ofayans@vm-244:~]$ ipa-replica-manage del 
vm-175.idm.lab.eng.brq.redhat.com
Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will 
be disconnected:
Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-244.idm.lab.eng.brq.redhat.com, 
vm-036.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-244.idm.lab.eng.brq.redhat.com, 
vm-056.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com
Server 

Re: [Freeipa-devel] Topology: Central node removal in star topology

2015-06-24 Thread Oleg Fayans



On 06/24/2015 12:28 PM, Ludwig Krispenz wrote:


On 06/24/2015 12:02 PM, Oleg Fayans wrote:



On 06/24/2015 11:47 AM, Ludwig Krispenz wrote:


On 06/24/2015 11:36 AM, Oleg Fayans wrote:



On 06/24/2015 11:25 AM, Ludwig Krispenz wrote:

Oleg,

the topology plugin relies on existing connection between servers 
which remain in a topolgy. If you remove a central node in your 
topology you are asking for trouble.
With Petr's patch it warns you that your topology will be 
disconnected, and if you insist we cannot guarantee anything.
Agree. I just wanted to try edge cases to see how one can break the 
system :)
should we completely prohibit this ? I don't know, I think you 
could also enforce an uninstall of vm175 with probably the same 
result.
what you mean be calculating the remaining topology and send it to 
the remaining servers does not work, it would require to send a 
removal of a segment, which would be rejected.


The topology is broken, and I don't know how much we should invest 
in making this info consistent on all servers.


More interesting would be if we can heal this later by adding new 
segments.
Yes, here comes the biggest question raised from this case: 
obviously, when none of the nodes possess the correct topology 
information (including the one which deleted the central node), 
there is no way to fix it by adding segments connecting the nodes 
that became disconnected. 
It shoul not need the full information, but it has to be able to 
reach one of the nodes to be connected. when the topology is broken, 
you loose to feature to be ably to apply a change on any node, eg in 
your case if you want to connect vm036 and vm056 an have removed 
vm175, you have to do it on vm056, vm036 or vm244. This should work, 
if not we have to fix it - unless we completely prevent 
disconnecting a topology
Well, this is exactly the problem here: all replicas should contain 
precise copies of all the info: accounts, hosts, sudorules, etc, 
including topology information. However, if in this case I manually 
connect disconnected node at vm127 (or vm056, does not matter) it 
results in topology information inconsistency across the infrastructure:

This would be the topology from the point of view of vm127:
did you add teh connection on vm127 or on vm244 ? sorry, but in these 
situations to understand what's going on, it can matter.
to me it looks like you did it on vm127, so its there, it got 
replicated to vm244, but replicationback does not work and so the 
deletion of teh segs to vm175, which should still be in the changelogs 
of 036 and 244, don#t get to 127. Do you have something in the error 
logs of 244 ?
Yes, I added the connection on vm127. vm244 does not have anything in 
the ldap errors log corresponding to the replication with vm127. In 
fact, I tried to create a user on vm244 to see if it will be replicated 
to vm127, and the user creation failed with the following error message:
Operations error: Allocation of a new value for range cn=posix 
ids,cn=distributed numeric assignment plugin,cn=plugins,cn=config 
failed! Unable to proceed.


Is it because the master node was deleted?
The corresponding message in the error log is
[24/Jun/2015:12:44:18 +0200] dna-plugin - dna_pre_op: no more values 
available!!




vm056  vm036
 \/  |
 vm175 |
  \  |
vm127   vm244

And this - from the point of view of vm244 and vm036

vm056  vm036
 \   |
 vm175 |
 |
vm127   -  vm244
I still think that the recalculation of the resulting tree should 
be done at least on the node that performs the removal action. And 
when later some other node gets connected, it should understand 
somehow that it's topology information is outdated


Ludwig
On 06/24/2015 11:04 AM, Oleg Fayans wrote:

Hi everybody,

Current implementation of topology plugin (including patch 878 
from Petr) allows the deletion of the central node in the star 
topology.

I had the following topology:

vm056  vm036
 \ / |
 vm175 |
 / \ |
vm127   vm244

I was able to remove node vm175 from node vm244:

[17:54:48]ofayans@vm-244:~]$ ipa-replica-manage del 
vm-175.idm.lab.eng.brq.redhat.com
Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will 
be disconnected:
Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-244.idm.lab.eng.brq.redhat.com, 
vm-036.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-244.idm.lab.eng.brq.redhat.com, 
vm-056.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com
Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: 
vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com

Continue to 

Re: [Freeipa-devel] Topology: Central node removal in star topology

2015-06-24 Thread Simo Sorce
On Wed, 2015-06-24 at 11:25 +0200, Ludwig Krispenz wrote:
 Oleg,
 
 the topology plugin relies on existing connection between servers which 
 remain in a topolgy. If you remove a central node in your topology you 
 are asking for trouble.
 With Petr's patch it warns you that your topology will be disconnected, 
 and if you insist we cannot guarantee anything.
 should we completely prohibit this ? 

No, but a --force should be needed.
Without a --force option we should not allow to remove a replica
completely from another one.

 I don't know, I think you could 
 also enforce an uninstall of vm175 with probably the same result.
 what you mean be calculating the remaining topology and send it to the 
 remaining servers does not work, it would require to send a removal of a 
 segment, which would be rejected.

You would have to connect to each replica that has a replication
agreement with vm175 and remove the segment from that replica. But it
wouldn't really help much as once a replica is isolated from the central
one, it will not see the other operations going on in other replicas.

Once we have a topology resolver we will be able to warn that removing a
specific replica will cause a split brain and make very loud warnings
and even offer solutions on how to reconnect the remaining replicas, but
nothing else can really be done if the admin insist in break the
replication topology, I guess.

 The topology is broken, and I don't know how much we should invest in 
 making this info consistent on all servers.

We just need to make it very clear to the admin that replication is
broken, later on we'll have visual tools to make it easier to understand
what is going on, but that's all we can do.

 More interesting would be if we can heal this later by adding new segments.

Indeed, reconnecting all the severed replicas should cause all the
removals (segments or servers) to be replicated among servers and should
bring back the topology view in a consistent state. But not until all
servers are reconnected and replication has started again.

Simo.


 Ludwig
 On 06/24/2015 11:04 AM, Oleg Fayans wrote:
  Hi everybody,
 
  Current implementation of topology plugin (including patch 878 from 
  Petr) allows the deletion of the central node in the star topology.
  I had the following topology:
 
  vm056  vm036
   \ / |
   vm175 |
   / \ |
  vm127   vm244
 
  I was able to remove node vm175 from node vm244:
 
  [17:54:48]ofayans@vm-244:~]$ ipa-replica-manage del 
  vm-175.idm.lab.eng.brq.redhat.com
  Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be 
  disconnected:
  Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: 
  vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
  Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: 
  vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com, 
  vm-127.idm.lab.eng.brq.redhat.com
  Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: 
  vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com, 
  vm-036.idm.lab.eng.brq.redhat.com
  Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: 
  vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com
  Continue to delete? [no]: yes
  Waiting for removal of replication agreements
  unexpected error: limits exceeded for this query
 
  I would expect this operation to delete 4 replication agreements on 
  all nodes:
  vm056 - vm175
  vm127 - vm175
  vm244 - vm175
  vm036 - vm175
 
  However an arbitrary set of replication agreements was deleted on each 
  node leading to total infrastructure inconsistency:
  ===
  vm056**thought the topology was as follows:
  vm056  vm036
 / |
   vm175 |
   / \ |
  vm127   vm244
  [10:28:55]ofayans@vm-056:~]$ ipa topologysegment-find realm
  --
  4 segments matched
  --
Segment name: 036-to-244
Left node: vm-036.idm.lab.eng.brq.redhat.com
Right node: vm-244.idm.lab.eng.brq.redhat.com
Connectivity: both
 
Segment name: 
  vm-036.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com
Left node: vm-036.idm.lab.eng.brq.redhat.com
Right node: vm-175.idm.lab.eng.brq.redhat.com
Connectivity: both
 
Segment name: 
  vm-127.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com
Left node: vm-127.idm.lab.eng.brq.redhat.com
Right node: vm-175.idm.lab.eng.brq.redhat.com
Connectivity: both
 
Segment name: 
  vm-175.idm.lab.eng.brq.redhat.com-to-vm-244.idm.lab.eng.brq.redhat.com
Left node: vm-175.idm.lab.eng.brq.redhat.com
Right node: vm-244.idm.lab.eng.brq.redhat.com
Connectivity: both
  
  Number of entries returned 4
  
  

Re: [Freeipa-devel] Topology: Central node removal in star topology

2015-06-24 Thread Simo Sorce
On Wed, 2015-06-24 at 15:01 -0400, Simo Sorce wrote:
 
 No, but a --force should be needed.
 Without a --force option we should not allow to remove a replica
 completely from another one.

I meant to add: if that action breaks the topology.
I think it is ok if we are removing a leaf from a central node.

Simo.

-- 
Simo Sorce * Red Hat, Inc * New York

-- 
Manage your subscription for the Freeipa-devel mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-devel
Contribute to FreeIPA: http://www.freeipa.org/page/Contribute/Code