[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2018-05-04 Thread Alexander Menshikov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463891#comment-16463891
 ] 

Alexander Menshikov commented on IGNITE-4501:
-

[~daradurvs]
{quote}what's the reason for increasing messages number?
{quote}
It was a long time ago, so I don't remember some details. But now process works 
like that:

*Once* pass across the ring for finding the coordinator.

+

*Once* pass across the ring for submitting coordinator decision (new node at 
the end of such passing, and coordinator right behind him)

 

After implementation of the task it will be like that:

*Once* pass across the ring for finding the coordinator.

+

*Once* pass across the ring for submitting coordinator decision except for new 
node (because other nodes can reject new node)

+

*Once* pass across the ring for finding the new node (for submitting final 
decision).

+

*Once* pass across the ring for finding the coordinator (for submitting final 
decision).

 
{quote}Have you ever benchmarked prepared solution, what's the results?
{quote}
No

> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Priority: Major
>  Labels: important
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2018-05-04 Thread Vyacheslav Daradur (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463856#comment-16463856
 ] 

Vyacheslav Daradur commented on IGNITE-4501:


[~sharpler], what's the reason for increasing messages number?
Have you ever benchmarked prepared solution, what's the results?

> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Priority: Major
>  Labels: important
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2018-05-04 Thread Alexander Menshikov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463821#comment-16463821
 ] 

Alexander Menshikov commented on IGNITE-4501:
-

[~daradurvs] I think it's better to not implement this task. When I was working 
on it, I have found if we allow nodes to stand in random place in the ring, 
then we have to pass 2x more messages across the ring for achieving consensus.

> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Priority: Major
>  Labels: important
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2018-05-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463731#comment-16463731
 ] 

ASF GitHub Bot commented on IGNITE-4501:


Github user SharplEr closed the pull request at:

https://github.com/apache/ignite/pull/1676


> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Priority: Major
>  Labels: important
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2017-05-30 Thread Yakov Zhdanov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029261#comment-16029261
 ] 

Yakov Zhdanov commented on IGNITE-4501:
---

[~sharpler], I though this ticket over one more time and I think we should 
postpone it for now. Changes seem to complex for just having an opportunity to 
point new node position in the ring. We still might have a lot of latency 
related problems in case of topologies spanning several data centers with 
higher latencies inter-DC connections even if we implement this properly.

I suggest to unschedule this from 2.1. Let's return to it at some point.

Thanks!

> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Assignee: Alexander Menshikov
>  Labels: important
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2017-05-02 Thread Alexander Menshikov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992726#comment-15992726
 ] 

Alexander Menshikov commented on IGNITE-4501:
-

[~yzhdanov]
Yakov, I has fix this problem too, but fount new one. In the 
GridDhtPartitionTopologyImpl#artitionMap(boolean onlyActive) I get an assertion 
because node2part.valid() is false. I spent i week in trying to understand what 
does it mean and how it connected with the discovery ring, but failed.

> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Assignee: Alexander Menshikov
>  Labels: important
> Fix For: 2.1
>
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2017-04-19 Thread Yakov Zhdanov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974624#comment-15974624
 ] 

Yakov Zhdanov commented on IGNITE-4501:
---

[~sharpler], disagree
When starting message processing node should check if local node is coordinator 
or not and if not forward message across ring. This is the way it works now and 
it seems should not be changed for now.

Thanks!

> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Assignee: Alexander Menshikov
>  Labels: important
> Fix For: 2.1
>
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2017-04-18 Thread Alexander Menshikov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972575#comment-15972575
 ] 

Alexander Menshikov commented on IGNITE-4501:
-

[~yzhdanov],

And one new case can happen. When message reaches the coordinator, and the 
coordinator sends it to next node, and the coordinator crashes after that, and 
some node in the middle are becoming a new coordinator, and message will reach 
this new coordinator, after that all nodes after new coordinator will not see 
the added message.

It can happen because candidates for coordinators sorted in the ring in the 
different order than nodes.

> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Assignee: Alexander Menshikov
> Fix For: 2.0
>
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2017-04-17 Thread Alexander Menshikov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971246#comment-15971246
 ] 

Alexander Menshikov commented on IGNITE-4501:
-

[~yzhdanov],

I can't find the best way. I can send message across the ring more times, but 
it's not optimal. Or I can send the message to coordinator in the first place 
and skip new node, after reaching the coordinator in second time I can send 
message from coordinator to new node, and finally send the message to 
coordinator in third time. But it's too complex change.

What do you think?

> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Assignee: Alexander Menshikov
> Fix For: 2.0
>
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2017-04-14 Thread Yakov Zhdanov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969196#comment-15969196
 ] 

Yakov Zhdanov commented on IGNITE-4501:
---

Alexander, you are right. Before your change new node has always been placed 
right before the coordinator. 

Did you manage to fix this? I think clearing discovery data on the node right 
before coordinator node.

I see one more point here. If new node is placed in the middle of the ring then 
(with current approach) it finishes its start before the rest of the ring 
(after new node to coordinator) have fired NODE_ADDED_EVT for it. Probably, we 
will need to reapproach joining process.

> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Assignee: Alexander Menshikov
> Fix For: 2.0
>
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2017-04-13 Thread Alexander Menshikov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968139#comment-15968139
 ] 

Alexander Menshikov commented on IGNITE-4501:
-

[~yzhdanov]

I understand what going on. Region ID breaks optimization in 
ServerImpl.RingMessageWorker#processNodeAddedMessage() which clears discovery 
data from TcpDiscoveryNodeAddedMessage in case when new node is sending 
TcpDiscoveryJoinRequestMessage to coordinator. Because after that if new node 
take a random position in the ring and then coordinator verifies it and sends 
verified message through the ring, but new node not in the end of the ring. 
This optimization clears data from message and the next nodes get NPE when try 
to get data.

> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Assignee: Alexander Menshikov
> Fix For: 2.0
>
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2017-04-13 Thread Alexander Menshikov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967679#comment-15967679
 ] 

Alexander Menshikov commented on IGNITE-4501:
-

[~yzhdanov]
I do my best. I hope I will manage to do it before 2.0 release. Problem in 
ServerImpl.RingMessageWorker#processNodeAddedMessage() with NPE in lines:

DiscoveryDataPacket dataPacket = msg.gridDiscoveryData();
if (dataPacket.hasJoiningNodeData())
   //^___Here, because dataPacket is null

I can fix it if remove one line:

msg.clearDiscoveryData();

#processNodeAddedMessage() is real complex, it's take a while to understand 
what going on in here.

> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Assignee: Alexander Menshikov
> Fix For: 2.0
>
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2017-04-13 Thread Yakov Zhdanov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967592#comment-15967592
 ] 

Yakov Zhdanov commented on IGNITE-4501:
---

[~sharpler], thanks for the update! When we do you think we can expect the fix? 
This issue is a good candidate for merging to 2.0.

> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Assignee: Alexander Menshikov
> Fix For: 2.0
>
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2017-04-13 Thread Alexander Menshikov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967563#comment-15967563
 ] 

Alexander Menshikov commented on IGNITE-4501:
-

[~yzhdanov]
I have fixed the bug. It was data race operation with local Node's internal 
order in ServerImpl. This code not mine so I don't know why it's work in 
master. Anyway after my fix this test became totally stable. But I found error 
in my test. It's strange because I checked for errors before make PR. Perhaps, 
something broke after merge with 2.0. I need more time for fix it.

> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Assignee: Alexander Menshikov
> Fix For: 2.0
>
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2017-04-07 Thread Alexander Menshikov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961196#comment-15961196
 ] 

Alexander Menshikov commented on IGNITE-4501:
-

Yakov,
It's very strange error. Thank you for found it. Error happens with 13% 
probability. I need more time for fix it, please wait me for Monday.

> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Assignee: Alexander Menshikov
> Fix For: 2.0
>
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2017-04-05 Thread Yakov Zhdanov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957944#comment-15957944
 ] 

Yakov Zhdanov commented on IGNITE-4501:
---

Alexander, 

I checked out your changes to finalize and commit, but discovered this failure

org.apache.ignite.spi.discovery.tcp.TcpDiscoverySelfTest#testFailedNodes4

{noformat}
[02:41:39,056][ERROR][tcp-disco-msg-worker-#1169%tcp.TcpDiscoverySelfTest2%][TcpDiscoverySelfTest$TestFailedNodesSpi]
 TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node in 
order to prevent cluster wide instability.
java.lang.AssertionError: Duplicate order [this=TcpDiscoveryNode 
[id=4215172c-d71b-4fd1-8baf-73ad9762, addrs=[127.0.0.1], 
sockAddrs=[/127.0.0.1:47502], discPort=47502, order=1, intOrder=1, 
lastExchangeTime=1491432076544, loc=true, ver=2.0.0#19700101-sha1:, 
clusterRegionId=-9223372036854775808, isClient=false], other=TcpDiscoveryNode 
[id=7063b039-493e-4aad-9036-30c96210, addrs=[127.0.0.1], 
sockAddrs=[/127.0.0.1:47500], discPort=47500, order=1, intOrder=1, 
lastExchangeTime=1491432076533, loc=false, ver=2.0.0#19700101-sha1:, 
clusterRegionId=-9223372036854775808, isClient=false]]
at 
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode.compareTo(TcpDiscoveryNode.java:563)
at 
org.apache.ignite.spi.discovery.tcp.internal.RegionNodeComparator.compare(RegionNodeComparator.java:33)
at 
org.apache.ignite.spi.discovery.tcp.internal.RegionNodeComparator.compare(RegionNodeComparator.java:26)
at java.util.TreeMap.compare(TreeMap.java:1291)
at java.util.TreeMap.getHigherEntry(TreeMap.java:463)
at java.util.TreeMap$NavigableSubMap.absLowest(TreeMap.java:1423)
at 
java.util.TreeMap$NavigableSubMap$EntrySetView.isEmpty(TreeMap.java:1639)
at java.util.TreeMap$NavigableSubMap.isEmpty(TreeMap.java:1498)
at java.util.TreeSet.isEmpty(TreeSet.java:216)
at 
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.serverNodes(TcpDiscoveryNodesRing.java:654)
at 
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.nextNode(TcpDiscoveryNodesRing.java:512)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.sendMessageAcrossRing(ServerImpl.java:2676)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processHeartbeatMessage(ServerImpl.java:4940)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2547)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2349)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6398)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2435)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
[02:41:39,059][ERROR][tcp-disco-msg-worker-#1169%tcp.TcpDiscoverySelfTest2%][TcpDiscoverySelfTest$TestFailedNodesSpi]
 Runtime error caught during grid runnable execution: IgniteSpiThread 
[name=tcp-disco-msg-worker-#1169%tcp.TcpDiscoverySelfTest2%]
java.lang.AssertionError: Duplicate order [this=TcpDiscoveryNode 
[id=4215172c-d71b-4fd1-8baf-73ad9762, addrs=[127.0.0.1], 
sockAddrs=[/127.0.0.1:47502], discPort=47502, order=1, intOrder=1, 
lastExchangeTime=1491432076544, loc=true, ver=2.0.0#19700101-sha1:, 
clusterRegionId=-9223372036854775808, isClient=false], other=TcpDiscoveryNode 
[id=7063b039-493e-4aad-9036-30c96210, addrs=[127.0.0.1], 
sockAddrs=[/127.0.0.1:47500], discPort=47500, order=1, intOrder=1, 
lastExchangeTime=1491432076533, loc=false, ver=2.0.0#19700101-sha1:, 
clusterRegionId=-9223372036854775808, isClient=false]]
at 
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode.compareTo(TcpDiscoveryNode.java:563)
at 
org.apache.ignite.spi.discovery.tcp.internal.RegionNodeComparator.compare(RegionNodeComparator.java:33)
at 
org.apache.ignite.spi.discovery.tcp.internal.RegionNodeComparator.compare(RegionNodeComparator.java:26)
at java.util.TreeMap.compare(TreeMap.java:1291)
at java.util.TreeMap.getHigherEntry(TreeMap.java:463)
at java.util.TreeMap$NavigableSubMap.absLowest(TreeMap.java:1423)
at 
java.util.TreeMap$NavigableSubMap$EntrySetView.isEmpty(TreeMap.java:1639)
at java.util.TreeMap$NavigableSubMap.isEmpty(TreeMap.java:1498)
at java.util.TreeSet.isEmpty(TreeSet.java:216)
at 
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.serverNodes(TcpDiscoveryNodesRing.java:654)
at 
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.nextNode(TcpDiscoveryNodesRing.java:512)
at 

[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2017-04-04 Thread Alexander Menshikov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954971#comment-15954971
 ] 

Alexander Menshikov commented on IGNITE-4501:
-

[~yzhdanov]

I create upsource review. 
http://reviews.ignite.apache.org/ignite/review/IGNT-CR-158
Please look at code and my answer in github. I want to include these changes in 
2.0 release, but time is running out.

> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Assignee: Alexander Menshikov
> Fix For: 2.0
>
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2017-03-29 Thread Alexander Menshikov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947195#comment-15947195
 ] 

Alexander Menshikov commented on IGNITE-4501:
-

[~yzhdanov]

I applied changes and answered the questions. If after my answer in review, you 
still think it's better to make default region ID to zero than to 
Long.MIN_VALUE, then I will accept it.

> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Assignee: Alexander Menshikov
> Fix For: 2.0
>
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2017-03-29 Thread Yakov Zhdanov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947080#comment-15947080
 ] 

Yakov Zhdanov commented on IGNITE-4501:
---

Alexander, please review my comments in the PR

Next time, can you please create upsource review?

> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Assignee: Alexander Menshikov
> Fix For: 2.0
>
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2017-03-27 Thread Alexander Menshikov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943319#comment-15943319
 ] 

Alexander Menshikov commented on IGNITE-4501:
-

[~yzhdanov]

I reopen PR: https://github.com/apache/ignite/pull/1676/files

> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Assignee: Alexander Menshikov
> Fix For: 2.0
>
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943297#comment-15943297
 ] 

ASF GitHub Bot commented on IGNITE-4501:


GitHub user SharplEr opened a pull request:

https://github.com/apache/ignite/pull/1676

IGNITE-4501: Improvement of connection in a cluster of new node

Reopen that PR: https://github.com/apache/ignite/pull/1436

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SharplEr/ignite ignite-4501

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/1676.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1676


commit 0e7768224e8ae26797460e2b750d04979c49c92f
Author: Alexander Menshikov 
Date:   2017-01-20T12:35:05Z

Add using CLUSTER_REGION_ID for ordering nodes in ring

commit c43894854a2bdcb5d39d92552f0e105c70a10198
Author: Alexander Menshikov 
Date:   2017-01-20T12:35:53Z

Add tests for RegionNodeComparator

commit 5a7d5e395c00ab7e6eafa03ed29ae4eea754cbc1
Author: Alexander Menshikov 
Date:   2017-01-20T12:44:28Z

Add test to suite

commit cdcda327773b61beb90539e7c11130906d0525dc
Author: Alexander Menshikov 
Date:   2017-01-21T10:57:54Z

A little clean up code

commit 620be443f3ef7bd4730ba755fb4e110d8dcd4918
Author: Alexander Menshikov 
Date:   2017-01-25T12:26:42Z

Change ordering inside nodes field, and add maxNode field to avoid using 
old ordering. Add method variant of serverNodes method which return only 
necessary part of ring. That make complexity of nextNode equals O(log n + k), 
which better then O(n). n is number of nodes, k is number of client nodes.

commit b45bad8b5dcf9d3904805503b1c327d4a6686d60
Author: Alexander Menshikov 
Date:   2017-01-25T13:03:27Z

Add filed to TcpDiscoveryNode for faster getting cluster_region_id

commit d9b68dd3cc4f546d73c475c7adf82ea797d851f2
Author: Alexander Menshikov 
Date:   2017-01-27T15:01:38Z

Fix bug in RegionNodeComparator

commit f36f07da21ca51bf2585fb80b82239541a3a16d6
Author: Alexander Menshikov 
Date:   2017-01-27T15:02:30Z

Cleanup code in RegionNodeComparatorTest

commit 145394cb5c10e5f1127071f1cb7957a3c1307a9d
Author: Alexander Menshikov 
Date:   2017-01-27T15:09:55Z

Add catch NumberFormatException

commit eeaf244788bd957a2b8f1e3b33d00338072edde1
Author: Alexander Menshikov 
Date:   2017-01-27T15:34:37Z

cleanup TcpDiscoveryNodesRing

commit a3431dec0103f815803af744ec3c75b7188c19fb
Author: Alexander Menshikov 
Date:   2017-01-30T10:37:16Z

Code cleanup

commit a41ee660eb275714bd924dfae7bf01c3274300f1
Author: Alexander Menshikov 
Date:   2017-01-30T11:58:26Z

Add @NotNull to RegionNodeComparator.compare

commit 8ebd887797cba7adf3ca08ea8af8ac43e29be221
Author: Alexander Menshikov 
Date:   2017-01-30T12:06:09Z

Add @Nullable to TcpDiscoveryNode.getClusterRegionId

commit fd91a18a421e951638c9f9078c34169916adde5e
Author: Alexander Menshikov 
Date:   2017-01-30T12:07:57Z

cleanup 
modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/internal/TcpDiscoveryNodesRing.java

commit f4652c888d07c1f52945c801be3e220d55895860
Author: Alexander Menshikov 
Date:   2017-01-30T16:31:44Z

Remove using cast to Nubmer

commit 6a3ac2eccd7a01bfde73def424f94d80da0c75c1
Author: Alexander Menshikov 
Date:   2017-01-30T16:37:20Z

restore code format

commit ecb82ffd908cf3de8618c8d557d4077f9341e3cc
Author: Alexander Menshikov 
Date:   2017-01-30T16:45:14Z

restore code format2

commit 2e3ec7797327d722160bdee1b0803508ce9b4058
Author: Alexander Menshikov 
Date:   2017-02-01T17:55:24Z

fix reusing variable

commit 77055549e20248c0f22eefc43e839ba82a3f3ecc
Author: Alexander Menshikov 
Date:   2017-02-01T17:57:55Z

Fix lost regionId after deserialization

commit 30d3cd992d0c4cf76b2b368c7a336863eb3b5e2d
Author: Alexander Menshikov 
Date:   2017-02-01T17:58:57Z

Add test for save sorting

commit 3e204be6d14d356d6cda6c11f2cd8d730348f44d
Author: Alexander Menshikov 
Date:   2017-02-02T11:25:24Z

Add RegionTcpDiscoverySelfTest

commit f1bdd6c936f7feae2535ab3f4dae8dddc9a4cde0
Author: Alexander Menshikov 
Date:   2017-02-02T11:31:08Z

Add license

commit 5e9343b96ecdcf25f0bdabf7468a2bd503ec7773
Author: Alexander Menshikov 
Date:   

[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2017-02-22 Thread Alexander Menshikov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878011#comment-15878011
 ] 

Alexander Menshikov commented on IGNITE-4501:
-

Yakov, I fixed it. Please look again.

> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Assignee: Alexander Menshikov
> Fix For: 2.0
>
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2017-02-20 Thread Yakov Zhdanov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874651#comment-15874651
 ] 

Yakov Zhdanov commented on IGNITE-4501:
---

Alexander, I reviewed the changes. Please see my comments on github PR.

I think it is safe to release it in 2.0.

Thanks!

> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Assignee: Alexander Menshikov
> Fix For: 2.0
>
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node

2017-02-15 Thread Alexander Menshikov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867564#comment-15867564
 ] 

Alexander Menshikov commented on IGNITE-4501:
-

Work already done. I'm just waiting for code review.

> Improvement of connection in a cluster of new node
> --
>
> Key: IGNITE-4501
> URL: https://issues.apache.org/jira/browse/IGNITE-4501
> Project: Ignite
>  Issue Type: Improvement
>  Components: messaging
>Affects Versions: 1.8
>Reporter: Vyacheslav Daradur
>Assignee: Alexander Menshikov
> Fix For: 2.0
>
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, 
> etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other 
> place, and places lost connect each other, we will have many ways of 
> reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then 
> we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of 
> the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of 
> reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where 
> n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place 
> for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet 
> : other nodes). We will use it when we connect a new node.
> * [dev list 
> thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=ke83-5-r...@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding 
> public static final constant to TcpDiscoverySpi.
> # Alter 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection)
>  to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs 
> are equal then we should compare nodes' IDs. This way we have consistent 
> order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This 
> can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)