Re: [Freeipa-devel] Topology: Central node removal in star topology
On Thu, 2015-06-25 at 09:53 +0200, Petr Vobornik wrote: > On 06/25/2015 08:52 AM, Ludwig Krispenz wrote: > > > > On 06/24/2015 09:01 PM, Simo Sorce wrote: > >> On Wed, 2015-06-24 at 11:25 +0200, Ludwig Krispenz wrote: > >>> Oleg, > >>> > >>> the topology plugin relies on existing connection between servers which > >>> remain in a topolgy. If you remove a central node in your topology you > >>> are asking for trouble. > >>> With Petr's patch it warns you that your topology will be disconnected, > >>> and if you insist we cannot guarantee anything. > >>> should we completely prohibit this ? > >> No, but a --force should be needed. > >> Without a --force option we should not allow to remove a replica > >> completely from another one. > >> > >>> I don't know, I think you could > >>> also enforce an uninstall of vm175 with probably the same result. > >>> what you mean be calculating the remaining topology and send it to the > >>> remaining servers does not work, it would require to send a removal of a > >>> segment, which would be rejected. > >> You would have to connect to each replica that has a replication > >> agreement with vm175 and remove the segment from that replica. But it > >> wouldn't really help much as once a replica is isolated from the central > >> one, it will not see the other operations going on in other replicas. > >> > >> Once we have a topology resolver we will be able to warn that removing a > >> specific replica will cause a split brain and make very loud warnings > > we have this already, see the output of Oleg's example: > > > > ipa-replica-manage del vm-175.idm.lab.eng.brq.redhat.com > > Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be > > disconnected: > > Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: > > vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com > > Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: > > vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com, > > vm-127.idm.lab.eng.brq.redhat.com > > Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: > > vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com, > > vm-036.idm.lab.eng.brq.redhat.com > > Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: > > vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com > > Continue to delete? [no]: yes > > > > it tells you that the topology gets disconnected and which connections > > will be missing, the continue yes/no is the --force, > > the question was, should we allow a force in this situation ? > > > > What it does is: > 1. Checks current topology, prints errors with introduction msg: > "Current topology is disconnected:" + errors > 2. Checks topology after node removal, prints errors with msg: > "Topology after removal of %s will be disconnected:" + errors > 3. if there were errors in #1 or #2, it does: > if not force and not ipautil.user_input("Continue to delete?", False): >sys.exit("Aborted") > > > To make it more loud we can introduce msg in #2 with: "WARNING: " or > something even more louder > > The question "Continue to delete?" could be > * removed, and therefore --force will be always required for such case > * be still regarded as 'force' but the question could be changed e.g. > to: "Continue to delete and disconnect the topology?" I do not like questions very much, they are usually annoying to scripting and such. I would not ask questions, and simply deny the operation if --force is not present, and allow it if it is present. > >>> More interesting would be if we can heal this later by adding new > >>> segments. > >> Indeed, reconnecting all the severed replicas should cause all the > >> removals (segments or servers) to be replicated among servers and should > >> bring back the topology view in a consistent state. But not until all > >> servers are reconnected and replication has started again. > > This healing can also be required without forcing removal by an admin. > > If you have a start topology and your central node goes down and is not > > recoverable Yes, I think the most likely case (bar testing) for ever using --force remove is that a server imploded and died, and just need replacing. Being able to recover from such a situation by simply reconnecting replicas until the split brain is healed is paramount. I would go as far as saying that perhaps we should provide a simple "heal-topology" command in a *future* version that will pick one replica and reconnect all the missing branches in a stellar topology. The only problem in doing that is that the tool my have a misleading idea of the status of the topology given that when replication is severed not all topology changes may be reflected to all servers. So different servers may have a different view of the current topology based on when they got disconnected and the replication flow was interrupted. So a good tool would have to reconnect all bran
Re: [Freeipa-devel] Topology: Central node removal in star topology
On 25.6.2015 09:53, Petr Vobornik wrote: > On 06/25/2015 08:52 AM, Ludwig Krispenz wrote: >> >> On 06/24/2015 09:01 PM, Simo Sorce wrote: >>> On Wed, 2015-06-24 at 11:25 +0200, Ludwig Krispenz wrote: Oleg, the topology plugin relies on existing connection between servers which remain in a topolgy. If you remove a central node in your topology you are asking for trouble. With Petr's patch it warns you that your topology will be disconnected, and if you insist we cannot guarantee anything. should we completely prohibit this ? >>> No, but a --force should be needed. >>> Without a --force option we should not allow to remove a replica >>> completely from another one. >>> I don't know, I think you could also enforce an uninstall of vm175 with probably the same result. what you mean be calculating the remaining topology and send it to the remaining servers does not work, it would require to send a removal of a segment, which would be rejected. >>> You would have to connect to each replica that has a replication >>> agreement with vm175 and remove the segment from that replica. But it >>> wouldn't really help much as once a replica is isolated from the central >>> one, it will not see the other operations going on in other replicas. >>> >>> Once we have a topology resolver we will be able to warn that removing a >>> specific replica will cause a split brain and make very loud warnings >> we have this already, see the output of Oleg's example: >> >> ipa-replica-manage del vm-175.idm.lab.eng.brq.redhat.com >> Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be >> disconnected: >> Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: >> vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com >> Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: >> vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com, >> vm-127.idm.lab.eng.brq.redhat.com >> Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: >> vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com, >> vm-036.idm.lab.eng.brq.redhat.com >> Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: >> vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com >> Continue to delete? [no]: yes >> >> it tells you that the topology gets disconnected and which connections >> will be missing, the continue yes/no is the --force, >> the question was, should we allow a force in this situation ? >> > > What it does is: > 1. Checks current topology, prints errors with introduction msg: >"Current topology is disconnected:" + errors > 2. Checks topology after node removal, prints errors with msg: >"Topology after removal of %s will be disconnected:" + errors > 3. if there were errors in #1 or #2, it does: >if not force and not ipautil.user_input("Continue to delete?", False): > sys.exit("Aborted") > > > To make it more loud we can introduce msg in #2 with: "WARNING: " or something > even more louder > > The question "Continue to delete?" could be > * removed, and therefore --force will be always required for such case > * be still regarded as 'force' but the question could be changed e.g. to: > "Continue to delete and disconnect the topology?" Nitpick: I'm not a native English speaker but "Current topology is disconnected" does not sound clear and scary enough to me. At very least, the line should start with "WARNING:" to follow the same patter as all other warnings. Also it would be nice to add something descriptive like 'Changes in will not be replicated to all servers and data WILL become inconsistent.' Or possibly 'GATE TO HELL IS WIDE OPEN'? :-) Of course all this needs to be rephrased to proper English ... Petr^2 Spacek More interesting would be if we can heal this later by adding new segments. >>> Indeed, reconnecting all the severed replicas should cause all the >>> removals (segments or servers) to be replicated among servers and should >>> bring back the topology view in a consistent state. But not until all >>> servers are reconnected and replication has started again. >> This healing can also be required without forcing removal by an admin. >> If you have a start topology and your central node goes down and is not >> recoverable >>> >>> Simo. >>> >>> Ludwig On 06/24/2015 11:04 AM, Oleg Fayans wrote: > Hi everybody, > > Current implementation of topology plugin (including patch 878 from > Petr) allows the deletion of the central node in the star topology. > I had the following topology: > > -- Manage your subscription for the Freeipa-devel mailing list: https://www.redhat.com/mailman/listinfo/freeipa-devel Contribute to FreeIPA: http://www.freeipa.org/page/Contribute/Code
Re: [Freeipa-devel] Topology: Central node removal in star topology
On 06/25/2015 08:52 AM, Ludwig Krispenz wrote: On 06/24/2015 09:01 PM, Simo Sorce wrote: On Wed, 2015-06-24 at 11:25 +0200, Ludwig Krispenz wrote: Oleg, the topology plugin relies on existing connection between servers which remain in a topolgy. If you remove a central node in your topology you are asking for trouble. With Petr's patch it warns you that your topology will be disconnected, and if you insist we cannot guarantee anything. should we completely prohibit this ? No, but a --force should be needed. Without a --force option we should not allow to remove a replica completely from another one. I don't know, I think you could also enforce an uninstall of vm175 with probably the same result. what you mean be calculating the remaining topology and send it to the remaining servers does not work, it would require to send a removal of a segment, which would be rejected. You would have to connect to each replica that has a replication agreement with vm175 and remove the segment from that replica. But it wouldn't really help much as once a replica is isolated from the central one, it will not see the other operations going on in other replicas. Once we have a topology resolver we will be able to warn that removing a specific replica will cause a split brain and make very loud warnings we have this already, see the output of Oleg's example: ipa-replica-manage del vm-175.idm.lab.eng.brq.redhat.com Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be disconnected: Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Continue to delete? [no]: yes it tells you that the topology gets disconnected and which connections will be missing, the continue yes/no is the --force, the question was, should we allow a force in this situation ? What it does is: 1. Checks current topology, prints errors with introduction msg: "Current topology is disconnected:" + errors 2. Checks topology after node removal, prints errors with msg: "Topology after removal of %s will be disconnected:" + errors 3. if there were errors in #1 or #2, it does: if not force and not ipautil.user_input("Continue to delete?", False): sys.exit("Aborted") To make it more loud we can introduce msg in #2 with: "WARNING: " or something even more louder The question "Continue to delete?" could be * removed, and therefore --force will be always required for such case * be still regarded as 'force' but the question could be changed e.g. to: "Continue to delete and disconnect the topology?" More interesting would be if we can heal this later by adding new segments. Indeed, reconnecting all the severed replicas should cause all the removals (segments or servers) to be replicated among servers and should bring back the topology view in a consistent state. But not until all servers are reconnected and replication has started again. This healing can also be required without forcing removal by an admin. If you have a start topology and your central node goes down and is not recoverable Simo. Ludwig On 06/24/2015 11:04 AM, Oleg Fayans wrote: Hi everybody, Current implementation of topology plugin (including patch 878 from Petr) allows the deletion of the central node in the star topology. I had the following topology: -- Petr Vobornik -- Manage your subscription for the Freeipa-devel mailing list: https://www.redhat.com/mailman/listinfo/freeipa-devel Contribute to FreeIPA: http://www.freeipa.org/page/Contribute/Code
Re: [Freeipa-devel] Topology: Central node removal in star topology
On 06/24/2015 09:01 PM, Simo Sorce wrote: On Wed, 2015-06-24 at 11:25 +0200, Ludwig Krispenz wrote: Oleg, the topology plugin relies on existing connection between servers which remain in a topolgy. If you remove a central node in your topology you are asking for trouble. With Petr's patch it warns you that your topology will be disconnected, and if you insist we cannot guarantee anything. should we completely prohibit this ? No, but a --force should be needed. Without a --force option we should not allow to remove a replica completely from another one. I don't know, I think you could also enforce an uninstall of vm175 with probably the same result. what you mean be calculating the remaining topology and send it to the remaining servers does not work, it would require to send a removal of a segment, which would be rejected. You would have to connect to each replica that has a replication agreement with vm175 and remove the segment from that replica. But it wouldn't really help much as once a replica is isolated from the central one, it will not see the other operations going on in other replicas. Once we have a topology resolver we will be able to warn that removing a specific replica will cause a split brain and make very loud warnings we have this already, see the output of Oleg's example: ipa-replica-manage del vm-175.idm.lab.eng.brq.redhat.com Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be disconnected: Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Continue to delete? [no]: yes it tells you that the topology gets disconnected and which connections will be missing, the continue yes/no is the --force, the question was, should we allow a force in this situation ? More interesting would be if we can heal this later by adding new segments. Indeed, reconnecting all the severed replicas should cause all the removals (segments or servers) to be replicated among servers and should bring back the topology view in a consistent state. But not until all servers are reconnected and replication has started again. This healing can also be required without forcing removal by an admin. If you have a start topology and your central node goes down and is not recoverable Simo. Ludwig On 06/24/2015 11:04 AM, Oleg Fayans wrote: Hi everybody, Current implementation of topology plugin (including patch 878 from Petr) allows the deletion of the central node in the star topology. I had the following topology: vm056 vm036 \ / | vm175 | / \ | vm127 vm244 I was able to remove node vm175 from node vm244: [17:54:48]ofayans@vm-244:~]$ ipa-replica-manage del vm-175.idm.lab.eng.brq.redhat.com Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be disconnected: Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Continue to delete? [no]: yes Waiting for removal of replication agreements unexpected error: limits exceeded for this query I would expect this operation to delete 4 replication agreements on all nodes: vm056 - vm175 vm127 - vm175 vm244 - vm175 vm036 - vm175 However an arbitrary set of replication agreements was deleted on each node leading to total infrastructure inconsistency: === vm056**thought the topology was as follows: vm056 vm036 / | vm175 | / \ | vm127 vm244 [10:28:55]ofayans@vm-056:~]$ ipa topologysegment-find realm -- 4 segments matched -- Segment name: 036-to-244 Left node: vm-036.idm.lab.eng.brq.redhat.com Right node: vm-244.idm.lab.eng.brq.redhat.com Connectivity: both Segment name: vm-036.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com Left node: vm-036.idm.lab.eng.brq.redhat.com Right node: v
Re: [Freeipa-devel] Topology: Central node removal in star topology
On Wed, 2015-06-24 at 15:01 -0400, Simo Sorce wrote: > > No, but a --force should be needed. > Without a --force option we should not allow to remove a replica > completely from another one. I meant to add: if that action breaks the topology. I think it is ""ok"" if we are removing a leaf from a central node. Simo. -- Simo Sorce * Red Hat, Inc * New York -- Manage your subscription for the Freeipa-devel mailing list: https://www.redhat.com/mailman/listinfo/freeipa-devel Contribute to FreeIPA: http://www.freeipa.org/page/Contribute/Code
Re: [Freeipa-devel] Topology: Central node removal in star topology
On Wed, 2015-06-24 at 11:25 +0200, Ludwig Krispenz wrote: > Oleg, > > the topology plugin relies on existing connection between servers which > remain in a topolgy. If you remove a central node in your topology you > are asking for trouble. > With Petr's patch it warns you that your topology will be disconnected, > and if you insist we cannot guarantee anything. > should we completely prohibit this ? No, but a --force should be needed. Without a --force option we should not allow to remove a replica completely from another one. > I don't know, I think you could > also enforce an uninstall of vm175 with probably the same result. > what you mean be calculating the remaining topology and send it to the > remaining servers does not work, it would require to send a removal of a > segment, which would be rejected. You would have to connect to each replica that has a replication agreement with vm175 and remove the segment from that replica. But it wouldn't really help much as once a replica is isolated from the central one, it will not see the other operations going on in other replicas. Once we have a topology resolver we will be able to warn that removing a specific replica will cause a split brain and make very loud warnings and even offer solutions on how to reconnect the remaining replicas, but nothing else can really be done if the admin insist in break the replication topology, I guess. > The topology is broken, and I don't know how much we should invest in > making this info consistent on all servers. We just need to make it very clear to the admin that replication is broken, later on we'll have visual tools to make it easier to understand what is going on, but that's all we can do. > More interesting would be if we can heal this later by adding new segments. Indeed, reconnecting all the severed replicas should cause all the removals (segments or servers) to be replicated among servers and should bring back the topology view in a consistent state. But not until all servers are reconnected and replication has started again. Simo. > Ludwig > On 06/24/2015 11:04 AM, Oleg Fayans wrote: > > Hi everybody, > > > > Current implementation of topology plugin (including patch 878 from > > Petr) allows the deletion of the central node in the star topology. > > I had the following topology: > > > > vm056 vm036 > > \ / | > > vm175 | > > / \ | > > vm127 vm244 > > > > I was able to remove node vm175 from node vm244: > > > > [17:54:48]ofayans@vm-244:~]$ ipa-replica-manage del > > vm-175.idm.lab.eng.brq.redhat.com > > Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be > > disconnected: > > Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: > > vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com > > Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: > > vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com, > > vm-127.idm.lab.eng.brq.redhat.com > > Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: > > vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com, > > vm-036.idm.lab.eng.brq.redhat.com > > Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: > > vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com > > Continue to delete? [no]: yes > > Waiting for removal of replication agreements > > unexpected error: limits exceeded for this query > > > > I would expect this operation to delete 4 replication agreements on > > all nodes: > > vm056 - vm175 > > vm127 - vm175 > > vm244 - vm175 > > vm036 - vm175 > > > > However an arbitrary set of replication agreements was deleted on each > > node leading to total infrastructure inconsistency: > > === > > vm056**thought the topology was as follows: > > vm056 vm036 > >/ | > > vm175 | > > / \ | > > vm127 vm244 > > [10:28:55]ofayans@vm-056:~]$ ipa topologysegment-find realm > > -- > > 4 segments matched > > -- > > Segment name: 036-to-244 > > Left node: vm-036.idm.lab.eng.brq.redhat.com > > Right node: vm-244.idm.lab.eng.brq.redhat.com > > Connectivity: both > > > > Segment name: > > vm-036.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com > > Left node: vm-036.idm.lab.eng.brq.redhat.com > > Right node: vm-175.idm.lab.eng.brq.redhat.com > > Connectivity: both > > > > Segment name: > > vm-127.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com > > Left node: vm-127.idm.lab.eng.brq.redhat.com > > Right node: vm-175.idm.lab.eng.brq.redhat.com > > Connectivity: both > > > > Segment name: > > vm-175.idm.lab.eng.brq.redhat.com-to-vm-244.idm.lab.eng.brq.redhat.com > > Left node: vm-175.idm.lab.eng.brq.redhat.com > > Right node: vm-244.idm.lab.eng.brq.redhat
Re: [Freeipa-devel] Topology: Central node removal in star topology
On 24.6.2015 13:09, Ludwig Krispenz wrote: > > On 06/24/2015 12:50 PM, Oleg Fayans wrote: >> >> >> On 06/24/2015 12:28 PM, Ludwig Krispenz wrote: >>> >>> On 06/24/2015 12:02 PM, Oleg Fayans wrote: On 06/24/2015 11:47 AM, Ludwig Krispenz wrote: > > On 06/24/2015 11:36 AM, Oleg Fayans wrote: >> >> >> On 06/24/2015 11:25 AM, Ludwig Krispenz wrote: >>> Oleg, >>> >>> the topology plugin relies on existing connection between servers which >>> remain in a topolgy. If you remove a central node in your topology you >>> are asking for trouble. >>> With Petr's patch it warns you that your topology will be disconnected, >>> and if you insist we cannot guarantee anything. >> Agree. I just wanted to try edge cases to see how one can break the >> system :) >>> should we completely prohibit this ? I don't know, I think you could >>> also enforce an uninstall of vm175 with probably the same result. >>> what you mean be calculating the remaining topology and send it to the >>> remaining servers does not work, it would require to send a removal of >>> a segment, which would be rejected. >>> >>> The topology is broken, and I don't know how much we should invest in >>> making this info consistent on all servers. >>> >>> More interesting would be if we can heal this later by adding new >>> segments. >> Yes, here comes the biggest question raised from this case: obviously, >> when none of the nodes possess the correct topology information >> (including the one which deleted the central node), there is no way to >> fix it by adding segments connecting the nodes that became disconnected. > It shoul not need the full information, but it has to be able to reach > one of the nodes to be connected. when the topology is broken, you loose > to feature to be ably to apply a change on any node, eg in your case if > you want to connect vm036 and vm056 an have removed vm175, you have to do > it on vm056, vm036 or vm244. This should work, if not we have to fix it - > unless we completely prevent disconnecting a topology Well, this is exactly the problem here: all replicas should contain precise copies of all the info: accounts, hosts, sudorules, etc, including topology information. However, if in this case I manually connect disconnected node at vm127 (or vm056, does not matter) it results in topology information inconsistency across the infrastructure: This would be the topology from the point of view of vm127: >>> did you add teh connection on vm127 or on vm244 ? sorry, but in these >>> situations to understand what's going on, it can matter. >>> to me it looks like you did it on vm127, so its there, it got replicated to >>> vm244, but replicationback does not work and so the deletion of teh segs to >>> vm175, which should still be in the changelogs of 036 and 244, don#t get to >>> 127. Do you have something in the error logs of 244 ? >> Yes, I added the connection on vm127. vm244 does not have anything in the >> ldap errors log corresponding to the replication with vm127. In fact, I >> tried to create a user on vm244 to see if it will be replicated to vm127, >> and the user creation failed with the following error message: >> Operations error: Allocation of a new value for range cn=posix >> ids,cn=distributed numeric assignment plugin,cn=plugins,cn=config failed! >> Unable to proceed. >> >> Is it because the master node was deleted? > think so, yes. > There are probably more things to check before removing a server :-( This particular error is caused by the way how we distribute DNA ranges among servers. The range is assigned only on first use (not during replica installation) so when the original master is gone you have no way how to obtain the range (if you did not need it before). This is tracked as https://bugzilla.redhat.com/show_bug.cgi?id=1211366 Please comment here so we do not forget how annoying it is :-) Petr^2 Spacek >> The corresponding message in the error log is >> [24/Jun/2015:12:44:18 +0200] dna-plugin - dna_pre_op: no more values >> available!! -- Manage your subscription for the Freeipa-devel mailing list: https://www.redhat.com/mailman/listinfo/freeipa-devel Contribute to FreeIPA: http://www.freeipa.org/page/Contribute/Code
Re: [Freeipa-devel] Topology: Central node removal in star topology
On 06/24/2015 12:50 PM, Oleg Fayans wrote: On 06/24/2015 12:28 PM, Ludwig Krispenz wrote: On 06/24/2015 12:02 PM, Oleg Fayans wrote: On 06/24/2015 11:47 AM, Ludwig Krispenz wrote: On 06/24/2015 11:36 AM, Oleg Fayans wrote: On 06/24/2015 11:25 AM, Ludwig Krispenz wrote: Oleg, the topology plugin relies on existing connection between servers which remain in a topolgy. If you remove a central node in your topology you are asking for trouble. With Petr's patch it warns you that your topology will be disconnected, and if you insist we cannot guarantee anything. Agree. I just wanted to try edge cases to see how one can break the system :) should we completely prohibit this ? I don't know, I think you could also enforce an uninstall of vm175 with probably the same result. what you mean be calculating the remaining topology and send it to the remaining servers does not work, it would require to send a removal of a segment, which would be rejected. The topology is broken, and I don't know how much we should invest in making this info consistent on all servers. More interesting would be if we can heal this later by adding new segments. Yes, here comes the biggest question raised from this case: obviously, when none of the nodes possess the correct topology information (including the one which deleted the central node), there is no way to fix it by adding segments connecting the nodes that became disconnected. It shoul not need the full information, but it has to be able to reach one of the nodes to be connected. when the topology is broken, you loose to feature to be ably to apply a change on any node, eg in your case if you want to connect vm036 and vm056 an have removed vm175, you have to do it on vm056, vm036 or vm244. This should work, if not we have to fix it - unless we completely prevent disconnecting a topology Well, this is exactly the problem here: all replicas should contain precise copies of all the info: accounts, hosts, sudorules, etc, including topology information. However, if in this case I manually connect disconnected node at vm127 (or vm056, does not matter) it results in topology information inconsistency across the infrastructure: This would be the topology from the point of view of vm127: did you add teh connection on vm127 or on vm244 ? sorry, but in these situations to understand what's going on, it can matter. to me it looks like you did it on vm127, so its there, it got replicated to vm244, but replicationback does not work and so the deletion of teh segs to vm175, which should still be in the changelogs of 036 and 244, don#t get to 127. Do you have something in the error logs of 244 ? Yes, I added the connection on vm127. vm244 does not have anything in the ldap errors log corresponding to the replication with vm127. In fact, I tried to create a user on vm244 to see if it will be replicated to vm127, and the user creation failed with the following error message: Operations error: Allocation of a new value for range cn=posix ids,cn=distributed numeric assignment plugin,cn=plugins,cn=config failed! Unable to proceed. Is it because the master node was deleted? think so, yes. There are probably more things to check before removing a server :-( The corresponding message in the error log is [24/Jun/2015:12:44:18 +0200] dna-plugin - dna_pre_op: no more values available!! vm056 vm036 \/ | vm175 | \ | vm127 vm244 And this - from the point of view of vm244 and vm036 vm056 vm036 \ | vm175 | | vm127 - vm244 I still think that the recalculation of the resulting tree should be done at least on the node that performs the removal action. And when later some other node gets connected, it should understand somehow that it's topology information is outdated Ludwig On 06/24/2015 11:04 AM, Oleg Fayans wrote: Hi everybody, Current implementation of topology plugin (including patch 878 from Petr) allows the deletion of the central node in the star topology. I had the following topology: vm056 vm036 \ / | vm175 | / \ | vm127 vm244 I was able to remove node vm175 from node vm244: [17:54:48]ofayans@vm-244:~]$ ipa-replica-manage del vm-175.idm.lab.eng.brq.redhat.com Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be disconnected: Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com Server vm-244.id
Re: [Freeipa-devel] Topology: Central node removal in star topology
On 06/24/2015 12:28 PM, Ludwig Krispenz wrote: On 06/24/2015 12:02 PM, Oleg Fayans wrote: On 06/24/2015 11:47 AM, Ludwig Krispenz wrote: On 06/24/2015 11:36 AM, Oleg Fayans wrote: On 06/24/2015 11:25 AM, Ludwig Krispenz wrote: Oleg, the topology plugin relies on existing connection between servers which remain in a topolgy. If you remove a central node in your topology you are asking for trouble. With Petr's patch it warns you that your topology will be disconnected, and if you insist we cannot guarantee anything. Agree. I just wanted to try edge cases to see how one can break the system :) should we completely prohibit this ? I don't know, I think you could also enforce an uninstall of vm175 with probably the same result. what you mean be calculating the remaining topology and send it to the remaining servers does not work, it would require to send a removal of a segment, which would be rejected. The topology is broken, and I don't know how much we should invest in making this info consistent on all servers. More interesting would be if we can heal this later by adding new segments. Yes, here comes the biggest question raised from this case: obviously, when none of the nodes possess the correct topology information (including the one which deleted the central node), there is no way to fix it by adding segments connecting the nodes that became disconnected. It shoul not need the full information, but it has to be able to reach one of the nodes to be connected. when the topology is broken, you loose to feature to be ably to apply a change on any node, eg in your case if you want to connect vm036 and vm056 an have removed vm175, you have to do it on vm056, vm036 or vm244. This should work, if not we have to fix it - unless we completely prevent disconnecting a topology Well, this is exactly the problem here: all replicas should contain precise copies of all the info: accounts, hosts, sudorules, etc, including topology information. However, if in this case I manually connect disconnected node at vm127 (or vm056, does not matter) it results in topology information inconsistency across the infrastructure: This would be the topology from the point of view of vm127: did you add teh connection on vm127 or on vm244 ? sorry, but in these situations to understand what's going on, it can matter. to me it looks like you did it on vm127, so its there, it got replicated to vm244, but replicationback does not work and so the deletion of teh segs to vm175, which should still be in the changelogs of 036 and 244, don#t get to 127. Do you have something in the error logs of 244 ? Yes, I added the connection on vm127. vm244 does not have anything in the ldap errors log corresponding to the replication with vm127. In fact, I tried to create a user on vm244 to see if it will be replicated to vm127, and the user creation failed with the following error message: Operations error: Allocation of a new value for range cn=posix ids,cn=distributed numeric assignment plugin,cn=plugins,cn=config failed! Unable to proceed. Is it because the master node was deleted? The corresponding message in the error log is [24/Jun/2015:12:44:18 +0200] dna-plugin - dna_pre_op: no more values available!! vm056 vm036 \/ | vm175 | \ | vm127 vm244 And this - from the point of view of vm244 and vm036 vm056 vm036 \ | vm175 | | vm127 - vm244 I still think that the recalculation of the resulting tree should be done at least on the node that performs the removal action. And when later some other node gets connected, it should understand somehow that it's topology information is outdated Ludwig On 06/24/2015 11:04 AM, Oleg Fayans wrote: Hi everybody, Current implementation of topology plugin (including patch 878 from Petr) allows the deletion of the central node in the star topology. I had the following topology: vm056 vm036 \ / | vm175 | / \ | vm127 vm244 I was able to remove node vm175 from node vm244: [17:54:48]ofayans@vm-244:~]$ ipa-replica-manage del vm-175.idm.lab.eng.brq.redhat.com Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be disconnected: Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Continue to de
Re: [Freeipa-devel] Topology: Central node removal in star topology
On 06/24/2015 12:02 PM, Oleg Fayans wrote: On 06/24/2015 11:47 AM, Ludwig Krispenz wrote: On 06/24/2015 11:36 AM, Oleg Fayans wrote: On 06/24/2015 11:25 AM, Ludwig Krispenz wrote: Oleg, the topology plugin relies on existing connection between servers which remain in a topolgy. If you remove a central node in your topology you are asking for trouble. With Petr's patch it warns you that your topology will be disconnected, and if you insist we cannot guarantee anything. Agree. I just wanted to try edge cases to see how one can break the system :) should we completely prohibit this ? I don't know, I think you could also enforce an uninstall of vm175 with probably the same result. what you mean be calculating the remaining topology and send it to the remaining servers does not work, it would require to send a removal of a segment, which would be rejected. The topology is broken, and I don't know how much we should invest in making this info consistent on all servers. More interesting would be if we can heal this later by adding new segments. Yes, here comes the biggest question raised from this case: obviously, when none of the nodes possess the correct topology information (including the one which deleted the central node), there is no way to fix it by adding segments connecting the nodes that became disconnected. It shoul not need the full information, but it has to be able to reach one of the nodes to be connected. when the topology is broken, you loose to feature to be ably to apply a change on any node, eg in your case if you want to connect vm036 and vm056 an have removed vm175, you have to do it on vm056, vm036 or vm244. This should work, if not we have to fix it - unless we completely prevent disconnecting a topology Well, this is exactly the problem here: all replicas should contain precise copies of all the info: accounts, hosts, sudorules, etc, including topology information. However, if in this case I manually connect disconnected node at vm127 (or vm056, does not matter) it results in topology information inconsistency across the infrastructure: This would be the topology from the point of view of vm127: did you add teh connection on vm127 or on vm244 ? sorry, but in these situations to understand what's going on, it can matter. to me it looks like you did it on vm127, so its there, it got replicated to vm244, but replicationback does not work and so the deletion of teh segs to vm175, which should still be in the changelogs of 036 and 244, don#t get to 127. Do you have something in the error logs of 244 ? vm056 vm036 \/ | vm175 | \ | vm127 vm244 And this - from the point of view of vm244 and vm036 vm056 vm036 \ | vm175 | | vm127 - vm244 I still think that the recalculation of the resulting tree should be done at least on the node that performs the removal action. And when later some other node gets connected, it should understand somehow that it's topology information is outdated Ludwig On 06/24/2015 11:04 AM, Oleg Fayans wrote: Hi everybody, Current implementation of topology plugin (including patch 878 from Petr) allows the deletion of the central node in the star topology. I had the following topology: vm056 vm036 \ / | vm175 | / \ | vm127 vm244 I was able to remove node vm175 from node vm244: [17:54:48]ofayans@vm-244:~]$ ipa-replica-manage del vm-175.idm.lab.eng.brq.redhat.com Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be disconnected: Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Continue to delete? [no]: yes Waiting for removal of replication agreements unexpected error: limits exceeded for this query I would expect this operation to delete 4 replication agreements on all nodes: vm056 - vm175 vm127 - vm175 vm244 - vm175 vm036 - vm175 However an arbitrary set of replication agreements was deleted on each node leading to total infrastructure inconsistency: === vm056**thought the topology was as follows: vm056 vm036 / | vm175 | / \ | vm127 vm244 [10:28:55]ofayans@vm-056:~]$ ipa topologysegment-find realm
Re: [Freeipa-devel] Topology: Central node removal in star topology
On 06/24/2015 12:02 PM, Oleg Fayans wrote: On 06/24/2015 11:47 AM, Ludwig Krispenz wrote: On 06/24/2015 11:36 AM, Oleg Fayans wrote: On 06/24/2015 11:25 AM, Ludwig Krispenz wrote: Oleg, the topology plugin relies on existing connection between servers which remain in a topolgy. If you remove a central node in your topology you are asking for trouble. With Petr's patch it warns you that your topology will be disconnected, and if you insist we cannot guarantee anything. Agree. I just wanted to try edge cases to see how one can break the system :) should we completely prohibit this ? I don't know, I think you could also enforce an uninstall of vm175 with probably the same result. what you mean be calculating the remaining topology and send it to the remaining servers does not work, it would require to send a removal of a segment, which would be rejected. The topology is broken, and I don't know how much we should invest in making this info consistent on all servers. More interesting would be if we can heal this later by adding new segments. Yes, here comes the biggest question raised from this case: obviously, when none of the nodes possess the correct topology information (including the one which deleted the central node), there is no way to fix it by adding segments connecting the nodes that became disconnected. It shoul not need the full information, but it has to be able to reach one of the nodes to be connected. when the topology is broken, you loose to feature to be ably to apply a change on any node, eg in your case if you want to connect vm036 and vm056 an have removed vm175, you have to do it on vm056, vm036 or vm244. This should work, if not we have to fix it - unless we completely prevent disconnecting a topology Well, this is exactly the problem here: all replicas should contain precise copies of all the info: accounts, hosts, sudorules, etc, including topology information. However, if in this case I manually connect disconnected node at vm127 (or vm056, does not matter) it results in topology information inconsistency across the infrastructure: This would be the topology from the point of view of vm127: vm056 vm036 \/ | vm175 | \ | vm127 vm244 sorry, I meant vm056 vm036 \/ | vm175 | \ | vm127 - vm244 And this - from the point of view of vm244 and vm036 vm056 vm036 \ | vm175 | | vm127 - vm244 I still think that the recalculation of the resulting tree should be done at least on the node that performs the removal action. And when later some other node gets connected, it should understand somehow that it's topology information is outdated Ludwig On 06/24/2015 11:04 AM, Oleg Fayans wrote: Hi everybody, Current implementation of topology plugin (including patch 878 from Petr) allows the deletion of the central node in the star topology. I had the following topology: vm056 vm036 \ / | vm175 | / \ | vm127 vm244 I was able to remove node vm175 from node vm244: [17:54:48]ofayans@vm-244:~]$ ipa-replica-manage del vm-175.idm.lab.eng.brq.redhat.com Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be disconnected: Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Continue to delete? [no]: yes Waiting for removal of replication agreements unexpected error: limits exceeded for this query I would expect this operation to delete 4 replication agreements on all nodes: vm056 - vm175 vm127 - vm175 vm244 - vm175 vm036 - vm175 However an arbitrary set of replication agreements was deleted on each node leading to total infrastructure inconsistency: === vm056**thought the topology was as follows: vm056 vm036 / | vm175 | / \ | vm127 vm244 [10:28:55]ofayans@vm-056:~]$ ipa topologysegment-find realm -- 4 segments matched -- Segment name: 036-to-244 Left node: vm-036.idm.lab.eng.brq.redhat.com Right node: vm-244.idm.lab.eng.brq.redhat.com Connectivity: both Segment name: vm-036.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com Left node:
Re: [Freeipa-devel] Topology: Central node removal in star topology
On 06/24/2015 11:47 AM, Ludwig Krispenz wrote: On 06/24/2015 11:36 AM, Oleg Fayans wrote: On 06/24/2015 11:25 AM, Ludwig Krispenz wrote: Oleg, the topology plugin relies on existing connection between servers which remain in a topolgy. If you remove a central node in your topology you are asking for trouble. With Petr's patch it warns you that your topology will be disconnected, and if you insist we cannot guarantee anything. Agree. I just wanted to try edge cases to see how one can break the system :) should we completely prohibit this ? I don't know, I think you could also enforce an uninstall of vm175 with probably the same result. what you mean be calculating the remaining topology and send it to the remaining servers does not work, it would require to send a removal of a segment, which would be rejected. The topology is broken, and I don't know how much we should invest in making this info consistent on all servers. More interesting would be if we can heal this later by adding new segments. Yes, here comes the biggest question raised from this case: obviously, when none of the nodes possess the correct topology information (including the one which deleted the central node), there is no way to fix it by adding segments connecting the nodes that became disconnected. It shoul not need the full information, but it has to be able to reach one of the nodes to be connected. when the topology is broken, you loose to feature to be ably to apply a change on any node, eg in your case if you want to connect vm036 and vm056 an have removed vm175, you have to do it on vm056, vm036 or vm244. This should work, if not we have to fix it - unless we completely prevent disconnecting a topology Well, this is exactly the problem here: all replicas should contain precise copies of all the info: accounts, hosts, sudorules, etc, including topology information. However, if in this case I manually connect disconnected node at vm127 (or vm056, does not matter) it results in topology information inconsistency across the infrastructure: This would be the topology from the point of view of vm127: vm056 vm036 \/ | vm175 | \ | vm127 vm244 And this - from the point of view of vm244 and vm036 vm056 vm036 \ | vm175 | | vm127 - vm244 I still think that the recalculation of the resulting tree should be done at least on the node that performs the removal action. And when later some other node gets connected, it should understand somehow that it's topology information is outdated Ludwig On 06/24/2015 11:04 AM, Oleg Fayans wrote: Hi everybody, Current implementation of topology plugin (including patch 878 from Petr) allows the deletion of the central node in the star topology. I had the following topology: vm056 vm036 \ / | vm175 | / \ | vm127 vm244 I was able to remove node vm175 from node vm244: [17:54:48]ofayans@vm-244:~]$ ipa-replica-manage del vm-175.idm.lab.eng.brq.redhat.com Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be disconnected: Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Continue to delete? [no]: yes Waiting for removal of replication agreements unexpected error: limits exceeded for this query I would expect this operation to delete 4 replication agreements on all nodes: vm056 - vm175 vm127 - vm175 vm244 - vm175 vm036 - vm175 However an arbitrary set of replication agreements was deleted on each node leading to total infrastructure inconsistency: === vm056**thought the topology was as follows: vm056 vm036 / | vm175 | / \ | vm127 vm244 [10:28:55]ofayans@vm-056:~]$ ipa topologysegment-find realm -- 4 segments matched -- Segment name: 036-to-244 Left node: vm-036.idm.lab.eng.brq.redhat.com Right node: vm-244.idm.lab.eng.brq.redhat.com Connectivity: both Segment name: vm-036.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com Left node: vm-036.idm.lab.eng.brq.redhat.com Right node: vm-175.idm.lab.eng.brq.redhat.com Connectivity: both Segment name: vm-127.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng
Re: [Freeipa-devel] Topology: Central node removal in star topology
On 06/24/2015 11:36 AM, Oleg Fayans wrote: On 06/24/2015 11:25 AM, Ludwig Krispenz wrote: Oleg, the topology plugin relies on existing connection between servers which remain in a topolgy. If you remove a central node in your topology you are asking for trouble. With Petr's patch it warns you that your topology will be disconnected, and if you insist we cannot guarantee anything. Agree. I just wanted to try edge cases to see how one can break the system :) should we completely prohibit this ? I don't know, I think you could also enforce an uninstall of vm175 with probably the same result. what you mean be calculating the remaining topology and send it to the remaining servers does not work, it would require to send a removal of a segment, which would be rejected. The topology is broken, and I don't know how much we should invest in making this info consistent on all servers. More interesting would be if we can heal this later by adding new segments. Yes, here comes the biggest question raised from this case: obviously, when none of the nodes possess the correct topology information (including the one which deleted the central node), there is no way to fix it by adding segments connecting the nodes that became disconnected. It shoul not need the full information, but it has to be able to reach one of the nodes to be connected. when the topology is broken, you loose to feature to be ably to apply a change on any node, eg in your case if you want to connect vm036 and vm056 an have removed vm175, you have to do it on vm056, vm036 or vm244. This should work, if not we have to fix it - unless we completely prevent disconnecting a topology I still think that the recalculation of the resulting tree should be done at least on the node that performs the removal action. And when later some other node gets connected, it should understand somehow that it's topology information is outdated Ludwig On 06/24/2015 11:04 AM, Oleg Fayans wrote: Hi everybody, Current implementation of topology plugin (including patch 878 from Petr) allows the deletion of the central node in the star topology. I had the following topology: vm056 vm036 \ / | vm175 | / \ | vm127 vm244 I was able to remove node vm175 from node vm244: [17:54:48]ofayans@vm-244:~]$ ipa-replica-manage del vm-175.idm.lab.eng.brq.redhat.com Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be disconnected: Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Continue to delete? [no]: yes Waiting for removal of replication agreements unexpected error: limits exceeded for this query I would expect this operation to delete 4 replication agreements on all nodes: vm056 - vm175 vm127 - vm175 vm244 - vm175 vm036 - vm175 However an arbitrary set of replication agreements was deleted on each node leading to total infrastructure inconsistency: === vm056**thought the topology was as follows: vm056 vm036 / | vm175 | / \ | vm127 vm244 [10:28:55]ofayans@vm-056:~]$ ipa topologysegment-find realm -- 4 segments matched -- Segment name: 036-to-244 Left node: vm-036.idm.lab.eng.brq.redhat.com Right node: vm-244.idm.lab.eng.brq.redhat.com Connectivity: both Segment name: vm-036.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com Left node: vm-036.idm.lab.eng.brq.redhat.com Right node: vm-175.idm.lab.eng.brq.redhat.com Connectivity: both Segment name: vm-127.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com Left node: vm-127.idm.lab.eng.brq.redhat.com Right node: vm-175.idm.lab.eng.brq.redhat.com Connectivity: both Segment name: vm-175.idm.lab.eng.brq.redhat.com-to-vm-244.idm.lab.eng.brq.redhat.com Left node: vm-175.idm.lab.eng.brq.redhat.com Right node: vm-244.idm.lab.eng.brq.redhat.com Connectivity: both Number of entries returned 4 === both vm036**vm244 thought the topology was as follows: vm056 vm036 \ | vm175 | / | vm127 vm244 [10:26:23]ofayans@vm-036:~]$ ipa topologysegment-find Suffix name: realm --
Re: [Freeipa-devel] Topology: Central node removal in star topology
On 06/24/2015 11:25 AM, Ludwig Krispenz wrote: Oleg, the topology plugin relies on existing connection between servers which remain in a topolgy. If you remove a central node in your topology you are asking for trouble. With Petr's patch it warns you that your topology will be disconnected, and if you insist we cannot guarantee anything. Agree. I just wanted to try edge cases to see how one can break the system :) should we completely prohibit this ? I don't know, I think you could also enforce an uninstall of vm175 with probably the same result. what you mean be calculating the remaining topology and send it to the remaining servers does not work, it would require to send a removal of a segment, which would be rejected. The topology is broken, and I don't know how much we should invest in making this info consistent on all servers. More interesting would be if we can heal this later by adding new segments. Yes, here comes the biggest question raised from this case: obviously, when none of the nodes possess the correct topology information (including the one which deleted the central node), there is no way to fix it by adding segments connecting the nodes that became disconnected. I still think that the recalculation of the resulting tree should be done at least on the node that performs the removal action. And when later some other node gets connected, it should understand somehow that it's topology information is outdated Ludwig On 06/24/2015 11:04 AM, Oleg Fayans wrote: Hi everybody, Current implementation of topology plugin (including patch 878 from Petr) allows the deletion of the central node in the star topology. I had the following topology: vm056 vm036 \ / | vm175 | / \ | vm127 vm244 I was able to remove node vm175 from node vm244: [17:54:48]ofayans@vm-244:~]$ ipa-replica-manage del vm-175.idm.lab.eng.brq.redhat.com Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be disconnected: Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Continue to delete? [no]: yes Waiting for removal of replication agreements unexpected error: limits exceeded for this query I would expect this operation to delete 4 replication agreements on all nodes: vm056 - vm175 vm127 - vm175 vm244 - vm175 vm036 - vm175 However an arbitrary set of replication agreements was deleted on each node leading to total infrastructure inconsistency: === vm056**thought the topology was as follows: vm056 vm036 / | vm175 | / \ | vm127 vm244 [10:28:55]ofayans@vm-056:~]$ ipa topologysegment-find realm -- 4 segments matched -- Segment name: 036-to-244 Left node: vm-036.idm.lab.eng.brq.redhat.com Right node: vm-244.idm.lab.eng.brq.redhat.com Connectivity: both Segment name: vm-036.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com Left node: vm-036.idm.lab.eng.brq.redhat.com Right node: vm-175.idm.lab.eng.brq.redhat.com Connectivity: both Segment name: vm-127.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com Left node: vm-127.idm.lab.eng.brq.redhat.com Right node: vm-175.idm.lab.eng.brq.redhat.com Connectivity: both Segment name: vm-175.idm.lab.eng.brq.redhat.com-to-vm-244.idm.lab.eng.brq.redhat.com Left node: vm-175.idm.lab.eng.brq.redhat.com Right node: vm-244.idm.lab.eng.brq.redhat.com Connectivity: both Number of entries returned 4 === both vm036**vm244 thought the topology was as follows: vm056 vm036 \ | vm175 | / | vm127 vm244 [10:26:23]ofayans@vm-036:~]$ ipa topologysegment-find Suffix name: realm -- 3 segments matched -- Segment name: 036-to-244 Left node: vm-036.idm.lab.eng.brq.redhat.com Right node: vm-244.idm.lab.eng.brq.redhat.com Connectivity: both Segment name: vm-056.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com Left node: vm-056.idm.lab.eng.brq.redhat.com Right node: vm-175.idm.lab.eng.brq.redhat.com Connectivity: both Segment name: vm-127.idm.lab.eng.brq.redhat.com-to-vm-175.idm.la
Re: [Freeipa-devel] Topology: Central node removal in star topology
Oleg, the topology plugin relies on existing connection between servers which remain in a topolgy. If you remove a central node in your topology you are asking for trouble. With Petr's patch it warns you that your topology will be disconnected, and if you insist we cannot guarantee anything. should we completely prohibit this ? I don't know, I think you could also enforce an uninstall of vm175 with probably the same result. what you mean be calculating the remaining topology and send it to the remaining servers does not work, it would require to send a removal of a segment, which would be rejected. The topology is broken, and I don't know how much we should invest in making this info consistent on all servers. More interesting would be if we can heal this later by adding new segments. Ludwig On 06/24/2015 11:04 AM, Oleg Fayans wrote: Hi everybody, Current implementation of topology plugin (including patch 878 from Petr) allows the deletion of the central node in the star topology. I had the following topology: vm056 vm036 \ / | vm175 | / \ | vm127 vm244 I was able to remove node vm175 from node vm244: [17:54:48]ofayans@vm-244:~]$ ipa-replica-manage del vm-175.idm.lab.eng.brq.redhat.com Topology after removal of vm-175.idm.lab.eng.brq.redhat.com will be disconnected: Server vm-036.idm.lab.eng.brq.redhat.com can't contact servers: vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-056.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Server vm-127.idm.lab.eng.brq.redhat.com can't contact servers: vm-244.idm.lab.eng.brq.redhat.com, vm-056.idm.lab.eng.brq.redhat.com, vm-036.idm.lab.eng.brq.redhat.com Server vm-244.idm.lab.eng.brq.redhat.com can't contact servers: vm-056.idm.lab.eng.brq.redhat.com, vm-127.idm.lab.eng.brq.redhat.com Continue to delete? [no]: yes Waiting for removal of replication agreements unexpected error: limits exceeded for this query I would expect this operation to delete 4 replication agreements on all nodes: vm056 - vm175 vm127 - vm175 vm244 - vm175 vm036 - vm175 However an arbitrary set of replication agreements was deleted on each node leading to total infrastructure inconsistency: === vm056**thought the topology was as follows: vm056 vm036 / | vm175 | / \ | vm127 vm244 [10:28:55]ofayans@vm-056:~]$ ipa topologysegment-find realm -- 4 segments matched -- Segment name: 036-to-244 Left node: vm-036.idm.lab.eng.brq.redhat.com Right node: vm-244.idm.lab.eng.brq.redhat.com Connectivity: both Segment name: vm-036.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com Left node: vm-036.idm.lab.eng.brq.redhat.com Right node: vm-175.idm.lab.eng.brq.redhat.com Connectivity: both Segment name: vm-127.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com Left node: vm-127.idm.lab.eng.brq.redhat.com Right node: vm-175.idm.lab.eng.brq.redhat.com Connectivity: both Segment name: vm-175.idm.lab.eng.brq.redhat.com-to-vm-244.idm.lab.eng.brq.redhat.com Left node: vm-175.idm.lab.eng.brq.redhat.com Right node: vm-244.idm.lab.eng.brq.redhat.com Connectivity: both Number of entries returned 4 === both vm036**vm244 thought the topology was as follows: vm056 vm036 \ | vm175 | / | vm127 vm244 [10:26:23]ofayans@vm-036:~]$ ipa topologysegment-find Suffix name: realm -- 3 segments matched -- Segment name: 036-to-244 Left node: vm-036.idm.lab.eng.brq.redhat.com Right node: vm-244.idm.lab.eng.brq.redhat.com Connectivity: both Segment name: vm-056.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com Left node: vm-056.idm.lab.eng.brq.redhat.com Right node: vm-175.idm.lab.eng.brq.redhat.com Connectivity: both Segment name: vm-127.idm.lab.eng.brq.redhat.com-to-vm-175.idm.lab.eng.brq.redhat.com Left node: vm-127.idm.lab.eng.brq.redhat.com Right node: vm-175.idm.lab.eng.brq.redhat.com Connectivity: both Number of entries returned 3 === **vm127 thought the topology was as follows: vm056 vm036 \/ | vm175 | \ | vm127 vm244 [10:31:08]ofayans@vm-127:~]$ ipa topologysegment-find realm -- 4 segments matched -- Segment name: 036-to-244 Left node: vm-036.idm.lab.eng.brq.redhat.com Right node: vm-244.idm.lab