[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463891#comment-16463891 ] Alexander Menshikov commented on IGNITE-4501: - [~daradurvs] {quote}what's the reason for increasing messages number? {quote} It was a long time ago, so I don't remember some details. But now process works like that: *Once* pass across the ring for finding the coordinator. + *Once* pass across the ring for submitting coordinator decision (new node at the end of such passing, and coordinator right behind him) After implementation of the task it will be like that: *Once* pass across the ring for finding the coordinator. + *Once* pass across the ring for submitting coordinator decision except for new node (because other nodes can reject new node) + *Once* pass across the ring for finding the new node (for submitting final decision). + *Once* pass across the ring for finding the coordinator (for submitting final decision). {quote}Have you ever benchmarked prepared solution, what's the results? {quote} No > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Priority: Major > Labels: important > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463856#comment-16463856 ] Vyacheslav Daradur commented on IGNITE-4501: [~sharpler], what's the reason for increasing messages number? Have you ever benchmarked prepared solution, what's the results? > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Priority: Major > Labels: important > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463821#comment-16463821 ] Alexander Menshikov commented on IGNITE-4501: - [~daradurvs] I think it's better to not implement this task. When I was working on it, I have found if we allow nodes to stand in random place in the ring, then we have to pass 2x more messages across the ring for achieving consensus. > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Priority: Major > Labels: important > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463731#comment-16463731 ] ASF GitHub Bot commented on IGNITE-4501: Github user SharplEr closed the pull request at: https://github.com/apache/ignite/pull/1676 > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Priority: Major > Labels: important > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029261#comment-16029261 ] Yakov Zhdanov commented on IGNITE-4501: --- [~sharpler], I though this ticket over one more time and I think we should postpone it for now. Changes seem to complex for just having an opportunity to point new node position in the ring. We still might have a lot of latency related problems in case of topologies spanning several data centers with higher latencies inter-DC connections even if we implement this properly. I suggest to unschedule this from 2.1. Let's return to it at some point. Thanks! > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Assignee: Alexander Menshikov > Labels: important > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992726#comment-15992726 ] Alexander Menshikov commented on IGNITE-4501: - [~yzhdanov] Yakov, I has fix this problem too, but fount new one. In the GridDhtPartitionTopologyImpl#artitionMap(boolean onlyActive) I get an assertion because node2part.valid() is false. I spent i week in trying to understand what does it mean and how it connected with the discovery ring, but failed. > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Assignee: Alexander Menshikov > Labels: important > Fix For: 2.1 > > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974624#comment-15974624 ] Yakov Zhdanov commented on IGNITE-4501: --- [~sharpler], disagree When starting message processing node should check if local node is coordinator or not and if not forward message across ring. This is the way it works now and it seems should not be changed for now. Thanks! > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Assignee: Alexander Menshikov > Labels: important > Fix For: 2.1 > > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972575#comment-15972575 ] Alexander Menshikov commented on IGNITE-4501: - [~yzhdanov], And one new case can happen. When message reaches the coordinator, and the coordinator sends it to next node, and the coordinator crashes after that, and some node in the middle are becoming a new coordinator, and message will reach this new coordinator, after that all nodes after new coordinator will not see the added message. It can happen because candidates for coordinators sorted in the ring in the different order than nodes. > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Assignee: Alexander Menshikov > Fix For: 2.0 > > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971246#comment-15971246 ] Alexander Menshikov commented on IGNITE-4501: - [~yzhdanov], I can't find the best way. I can send message across the ring more times, but it's not optimal. Or I can send the message to coordinator in the first place and skip new node, after reaching the coordinator in second time I can send message from coordinator to new node, and finally send the message to coordinator in third time. But it's too complex change. What do you think? > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Assignee: Alexander Menshikov > Fix For: 2.0 > > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969196#comment-15969196 ] Yakov Zhdanov commented on IGNITE-4501: --- Alexander, you are right. Before your change new node has always been placed right before the coordinator. Did you manage to fix this? I think clearing discovery data on the node right before coordinator node. I see one more point here. If new node is placed in the middle of the ring then (with current approach) it finishes its start before the rest of the ring (after new node to coordinator) have fired NODE_ADDED_EVT for it. Probably, we will need to reapproach joining process. > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Assignee: Alexander Menshikov > Fix For: 2.0 > > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968139#comment-15968139 ] Alexander Menshikov commented on IGNITE-4501: - [~yzhdanov] I understand what going on. Region ID breaks optimization in ServerImpl.RingMessageWorker#processNodeAddedMessage() which clears discovery data from TcpDiscoveryNodeAddedMessage in case when new node is sending TcpDiscoveryJoinRequestMessage to coordinator. Because after that if new node take a random position in the ring and then coordinator verifies it and sends verified message through the ring, but new node not in the end of the ring. This optimization clears data from message and the next nodes get NPE when try to get data. > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Assignee: Alexander Menshikov > Fix For: 2.0 > > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967679#comment-15967679 ] Alexander Menshikov commented on IGNITE-4501: - [~yzhdanov] I do my best. I hope I will manage to do it before 2.0 release. Problem in ServerImpl.RingMessageWorker#processNodeAddedMessage() with NPE in lines: DiscoveryDataPacket dataPacket = msg.gridDiscoveryData(); if (dataPacket.hasJoiningNodeData()) //^___Here, because dataPacket is null I can fix it if remove one line: msg.clearDiscoveryData(); #processNodeAddedMessage() is real complex, it's take a while to understand what going on in here. > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Assignee: Alexander Menshikov > Fix For: 2.0 > > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967592#comment-15967592 ] Yakov Zhdanov commented on IGNITE-4501: --- [~sharpler], thanks for the update! When we do you think we can expect the fix? This issue is a good candidate for merging to 2.0. > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Assignee: Alexander Menshikov > Fix For: 2.0 > > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967563#comment-15967563 ] Alexander Menshikov commented on IGNITE-4501: - [~yzhdanov] I have fixed the bug. It was data race operation with local Node's internal order in ServerImpl. This code not mine so I don't know why it's work in master. Anyway after my fix this test became totally stable. But I found error in my test. It's strange because I checked for errors before make PR. Perhaps, something broke after merge with 2.0. I need more time for fix it. > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Assignee: Alexander Menshikov > Fix For: 2.0 > > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961196#comment-15961196 ] Alexander Menshikov commented on IGNITE-4501: - Yakov, It's very strange error. Thank you for found it. Error happens with 13% probability. I need more time for fix it, please wait me for Monday. > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Assignee: Alexander Menshikov > Fix For: 2.0 > > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957944#comment-15957944 ] Yakov Zhdanov commented on IGNITE-4501: --- Alexander, I checked out your changes to finalize and commit, but discovered this failure org.apache.ignite.spi.discovery.tcp.TcpDiscoverySelfTest#testFailedNodes4 {noformat} [02:41:39,056][ERROR][tcp-disco-msg-worker-#1169%tcp.TcpDiscoverySelfTest2%][TcpDiscoverySelfTest$TestFailedNodesSpi] TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node in order to prevent cluster wide instability. java.lang.AssertionError: Duplicate order [this=TcpDiscoveryNode [id=4215172c-d71b-4fd1-8baf-73ad9762, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47502], discPort=47502, order=1, intOrder=1, lastExchangeTime=1491432076544, loc=true, ver=2.0.0#19700101-sha1:, clusterRegionId=-9223372036854775808, isClient=false], other=TcpDiscoveryNode [id=7063b039-493e-4aad-9036-30c96210, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1491432076533, loc=false, ver=2.0.0#19700101-sha1:, clusterRegionId=-9223372036854775808, isClient=false]] at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode.compareTo(TcpDiscoveryNode.java:563) at org.apache.ignite.spi.discovery.tcp.internal.RegionNodeComparator.compare(RegionNodeComparator.java:33) at org.apache.ignite.spi.discovery.tcp.internal.RegionNodeComparator.compare(RegionNodeComparator.java:26) at java.util.TreeMap.compare(TreeMap.java:1291) at java.util.TreeMap.getHigherEntry(TreeMap.java:463) at java.util.TreeMap$NavigableSubMap.absLowest(TreeMap.java:1423) at java.util.TreeMap$NavigableSubMap$EntrySetView.isEmpty(TreeMap.java:1639) at java.util.TreeMap$NavigableSubMap.isEmpty(TreeMap.java:1498) at java.util.TreeSet.isEmpty(TreeSet.java:216) at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.serverNodes(TcpDiscoveryNodesRing.java:654) at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.nextNode(TcpDiscoveryNodesRing.java:512) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.sendMessageAcrossRing(ServerImpl.java:2676) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processHeartbeatMessage(ServerImpl.java:4940) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2547) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2349) at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6398) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2435) at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) [02:41:39,059][ERROR][tcp-disco-msg-worker-#1169%tcp.TcpDiscoverySelfTest2%][TcpDiscoverySelfTest$TestFailedNodesSpi] Runtime error caught during grid runnable execution: IgniteSpiThread [name=tcp-disco-msg-worker-#1169%tcp.TcpDiscoverySelfTest2%] java.lang.AssertionError: Duplicate order [this=TcpDiscoveryNode [id=4215172c-d71b-4fd1-8baf-73ad9762, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47502], discPort=47502, order=1, intOrder=1, lastExchangeTime=1491432076544, loc=true, ver=2.0.0#19700101-sha1:, clusterRegionId=-9223372036854775808, isClient=false], other=TcpDiscoveryNode [id=7063b039-493e-4aad-9036-30c96210, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1491432076533, loc=false, ver=2.0.0#19700101-sha1:, clusterRegionId=-9223372036854775808, isClient=false]] at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode.compareTo(TcpDiscoveryNode.java:563) at org.apache.ignite.spi.discovery.tcp.internal.RegionNodeComparator.compare(RegionNodeComparator.java:33) at org.apache.ignite.spi.discovery.tcp.internal.RegionNodeComparator.compare(RegionNodeComparator.java:26) at java.util.TreeMap.compare(TreeMap.java:1291) at java.util.TreeMap.getHigherEntry(TreeMap.java:463) at java.util.TreeMap$NavigableSubMap.absLowest(TreeMap.java:1423) at java.util.TreeMap$NavigableSubMap$EntrySetView.isEmpty(TreeMap.java:1639) at java.util.TreeMap$NavigableSubMap.isEmpty(TreeMap.java:1498) at java.util.TreeSet.isEmpty(TreeSet.java:216) at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.serverNodes(TcpDiscoveryNodesRing.java:654) at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.nextNode(TcpDiscoveryNodesRing.java:512) at
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954971#comment-15954971 ] Alexander Menshikov commented on IGNITE-4501: - [~yzhdanov] I create upsource review. http://reviews.ignite.apache.org/ignite/review/IGNT-CR-158 Please look at code and my answer in github. I want to include these changes in 2.0 release, but time is running out. > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Assignee: Alexander Menshikov > Fix For: 2.0 > > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947195#comment-15947195 ] Alexander Menshikov commented on IGNITE-4501: - [~yzhdanov] I applied changes and answered the questions. If after my answer in review, you still think it's better to make default region ID to zero than to Long.MIN_VALUE, then I will accept it. > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Assignee: Alexander Menshikov > Fix For: 2.0 > > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947080#comment-15947080 ] Yakov Zhdanov commented on IGNITE-4501: --- Alexander, please review my comments in the PR Next time, can you please create upsource review? > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Assignee: Alexander Menshikov > Fix For: 2.0 > > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943319#comment-15943319 ] Alexander Menshikov commented on IGNITE-4501: - [~yzhdanov] I reopen PR: https://github.com/apache/ignite/pull/1676/files > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Assignee: Alexander Menshikov > Fix For: 2.0 > > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943297#comment-15943297 ] ASF GitHub Bot commented on IGNITE-4501: GitHub user SharplEr opened a pull request: https://github.com/apache/ignite/pull/1676 IGNITE-4501: Improvement of connection in a cluster of new node Reopen that PR: https://github.com/apache/ignite/pull/1436 You can merge this pull request into a Git repository by running: $ git pull https://github.com/SharplEr/ignite ignite-4501 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/ignite/pull/1676.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1676 commit 0e7768224e8ae26797460e2b750d04979c49c92f Author: Alexander MenshikovDate: 2017-01-20T12:35:05Z Add using CLUSTER_REGION_ID for ordering nodes in ring commit c43894854a2bdcb5d39d92552f0e105c70a10198 Author: Alexander Menshikov Date: 2017-01-20T12:35:53Z Add tests for RegionNodeComparator commit 5a7d5e395c00ab7e6eafa03ed29ae4eea754cbc1 Author: Alexander Menshikov Date: 2017-01-20T12:44:28Z Add test to suite commit cdcda327773b61beb90539e7c11130906d0525dc Author: Alexander Menshikov Date: 2017-01-21T10:57:54Z A little clean up code commit 620be443f3ef7bd4730ba755fb4e110d8dcd4918 Author: Alexander Menshikov Date: 2017-01-25T12:26:42Z Change ordering inside nodes field, and add maxNode field to avoid using old ordering. Add method variant of serverNodes method which return only necessary part of ring. That make complexity of nextNode equals O(log n + k), which better then O(n). n is number of nodes, k is number of client nodes. commit b45bad8b5dcf9d3904805503b1c327d4a6686d60 Author: Alexander Menshikov Date: 2017-01-25T13:03:27Z Add filed to TcpDiscoveryNode for faster getting cluster_region_id commit d9b68dd3cc4f546d73c475c7adf82ea797d851f2 Author: Alexander Menshikov Date: 2017-01-27T15:01:38Z Fix bug in RegionNodeComparator commit f36f07da21ca51bf2585fb80b82239541a3a16d6 Author: Alexander Menshikov Date: 2017-01-27T15:02:30Z Cleanup code in RegionNodeComparatorTest commit 145394cb5c10e5f1127071f1cb7957a3c1307a9d Author: Alexander Menshikov Date: 2017-01-27T15:09:55Z Add catch NumberFormatException commit eeaf244788bd957a2b8f1e3b33d00338072edde1 Author: Alexander Menshikov Date: 2017-01-27T15:34:37Z cleanup TcpDiscoveryNodesRing commit a3431dec0103f815803af744ec3c75b7188c19fb Author: Alexander Menshikov Date: 2017-01-30T10:37:16Z Code cleanup commit a41ee660eb275714bd924dfae7bf01c3274300f1 Author: Alexander Menshikov Date: 2017-01-30T11:58:26Z Add @NotNull to RegionNodeComparator.compare commit 8ebd887797cba7adf3ca08ea8af8ac43e29be221 Author: Alexander Menshikov Date: 2017-01-30T12:06:09Z Add @Nullable to TcpDiscoveryNode.getClusterRegionId commit fd91a18a421e951638c9f9078c34169916adde5e Author: Alexander Menshikov Date: 2017-01-30T12:07:57Z cleanup modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/internal/TcpDiscoveryNodesRing.java commit f4652c888d07c1f52945c801be3e220d55895860 Author: Alexander Menshikov Date: 2017-01-30T16:31:44Z Remove using cast to Nubmer commit 6a3ac2eccd7a01bfde73def424f94d80da0c75c1 Author: Alexander Menshikov Date: 2017-01-30T16:37:20Z restore code format commit ecb82ffd908cf3de8618c8d557d4077f9341e3cc Author: Alexander Menshikov Date: 2017-01-30T16:45:14Z restore code format2 commit 2e3ec7797327d722160bdee1b0803508ce9b4058 Author: Alexander Menshikov Date: 2017-02-01T17:55:24Z fix reusing variable commit 77055549e20248c0f22eefc43e839ba82a3f3ecc Author: Alexander Menshikov Date: 2017-02-01T17:57:55Z Fix lost regionId after deserialization commit 30d3cd992d0c4cf76b2b368c7a336863eb3b5e2d Author: Alexander Menshikov Date: 2017-02-01T17:58:57Z Add test for save sorting commit 3e204be6d14d356d6cda6c11f2cd8d730348f44d Author: Alexander Menshikov Date: 2017-02-02T11:25:24Z Add RegionTcpDiscoverySelfTest commit f1bdd6c936f7feae2535ab3f4dae8dddc9a4cde0 Author: Alexander Menshikov Date: 2017-02-02T11:31:08Z Add license commit 5e9343b96ecdcf25f0bdabf7468a2bd503ec7773 Author: Alexander Menshikov Date:
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878011#comment-15878011 ] Alexander Menshikov commented on IGNITE-4501: - Yakov, I fixed it. Please look again. > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Assignee: Alexander Menshikov > Fix For: 2.0 > > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874651#comment-15874651 ] Yakov Zhdanov commented on IGNITE-4501: --- Alexander, I reviewed the changes. Please see my comments on github PR. I think it is safe to release it in 2.0. Thanks! > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Assignee: Alexander Menshikov > Fix For: 2.0 > > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
[ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867564#comment-15867564 ] Alexander Menshikov commented on IGNITE-4501: - Work already done. I'm just waiting for code review. > Improvement of connection in a cluster of new node > -- > > Key: IGNITE-4501 > URL: https://issues.apache.org/jira/browse/IGNITE-4501 > Project: Ignite > Issue Type: Improvement > Components: messaging >Affects Versions: 1.8 >Reporter: Vyacheslav Daradur >Assignee: Alexander Menshikov > Fix For: 2.0 > > > h3. Main description: > Cluster nodes connect a ring. > For example: we have 6 nodes: A, B, C, D, E, F. > They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, > etc. > If some node leaves topology, adjacent nodes must reconnect. > If nodes A, B, C are in same physical place, nodes D, E, F are in other > place, and places lost connect each other, we will have many ways of > reconnections. > At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then > we have only one reconnect (C > will be connected to A or F will be connected to D -- depends on what part of > the cluster was alive. > Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of > reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where > n -- number of nodes). > h3. Approach: > It is necessary to develop approach of node insertion to the correct place > for creation of the correct ring-topology. > h3. Solutions: > Main idea is a sorting according to latency. > * group nodes in arcs on an ARC_ID. (manualy?) > * implement NodeComparator (nodes on the same host : nodes on the same subnet > : other nodes). We will use it when we connect a new node. > * [dev list > thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E] > Update Dec, 29 Yakov Zhdanov: > # introduce CLUSTER_REGION_ID node attribute. This can be done by adding > public static final constant to TcpDiscoverySpi. > # Alter > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection) > to order basing on per node attribute value > # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs > are equal then we should compare nodes' IDs. This way we have consistent > order on all nodes in topology. > # Also nextNode() has to group nodes on same host and in same subnet. This > can be postponed and implemented after we have other points done. -- This message was sent by Atlassian JIRA (v6.3.15#6346)