[jira] [Commented] (IGNITE-5569) TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a cluster DDoS

2019-02-14 Thread Dmitriy Govorukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768745#comment-16768745
 ] 

Dmitriy Govorukhin commented on IGNITE-5569:


[~sergey-chugunov] LGTM, thanks for the contribution! Merged to master.

> TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a 
> cluster DDoS
> -
>
> Key: IGNITE-5569
> URL: https://issues.apache.org/jira/browse/IGNITE-5569
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 1.7
>Reporter: Alexey Goncharuk
>Assignee: Sergey Chugunov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> A firewall configuration issue may effectively lead to a cluster DDoS. The 
> scheme is as follows:
> 1) A node G joins the cluster, and a firewall rule forbids incoming 
> connection from cluster to this node
> 2) Cluster successfully processes NodeAddedMesage and fires a discovery 
> NODE_JOINED event (not sure why?)
> 4) The last node in the ring fails to connect to the newly joined node and 
> generates NODE_FAILED event
> 5) Coordinator drops the connection, joining node attempts to connect again
> The issues I see here:
> 1) Neither coordinator nor joining node print out the reason why the joining 
> node failed / did not join. A slight hint (failed to send message to the next 
> node) is printed on the node with the largest order (the one that attempted 
> to close the ring), but the root cause (connection refused) is also not 
> printed
> 2) The joining node attempts to connect to the cluster with the same node ID. 
> This violates an invariant we heavily rely on that once a node ID leaves a 
> cluster, this ID never comes back again
> 3) Each discovery event leads to a partition exchange which blocks all cache 
> operations for a time interval equal at least to the full ring latency time. 
> If several nodes are started on a malicious host, this may lead to almost 
> full cluster degradation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-5569) TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a cluster DDoS

2019-02-14 Thread Ignite TC Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768740#comment-16768740
 ] 

Ignite TC Bot commented on IGNITE-5569:
---

{panel:title=-- Run :: All: No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=3085164buildTypeId=IgniteTests24Java8_RunAll]

> TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a 
> cluster DDoS
> -
>
> Key: IGNITE-5569
> URL: https://issues.apache.org/jira/browse/IGNITE-5569
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 1.7
>Reporter: Alexey Goncharuk
>Assignee: Sergey Chugunov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> A firewall configuration issue may effectively lead to a cluster DDoS. The 
> scheme is as follows:
> 1) A node G joins the cluster, and a firewall rule forbids incoming 
> connection from cluster to this node
> 2) Cluster successfully processes NodeAddedMesage and fires a discovery 
> NODE_JOINED event (not sure why?)
> 4) The last node in the ring fails to connect to the newly joined node and 
> generates NODE_FAILED event
> 5) Coordinator drops the connection, joining node attempts to connect again
> The issues I see here:
> 1) Neither coordinator nor joining node print out the reason why the joining 
> node failed / did not join. A slight hint (failed to send message to the next 
> node) is printed on the node with the largest order (the one that attempted 
> to close the ring), but the root cause (connection refused) is also not 
> printed
> 2) The joining node attempts to connect to the cluster with the same node ID. 
> This violates an invariant we heavily rely on that once a node ID leaves a 
> cluster, this ID never comes back again
> 3) Each discovery event leads to a partition exchange which blocks all cache 
> operations for a time interval equal at least to the full ring latency time. 
> If several nodes are started on a malicious host, this may lead to almost 
> full cluster degradation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-5569) TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a cluster DDoS

2019-02-05 Thread Alexey Goncharuk (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760650#comment-16760650
 ] 

Alexey Goncharuk commented on IGNITE-5569:
--

I've recently stumbled upon another duplicate discovery notification case, when 
there were no firewall involved. Looks like a ring can "forget" about node fail 
event and process node join request again. I think we can introduce a limited 
history of ever joined nodes and forbid to join a node (send an error response 
for join request and drop node added message) if such a node is present in the 
history.

> TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a 
> cluster DDoS
> -
>
> Key: IGNITE-5569
> URL: https://issues.apache.org/jira/browse/IGNITE-5569
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 1.7
>Reporter: Alexey Goncharuk
>Assignee: Dmitry Karachentsev
>Priority: Major
> Fix For: 2.8
>
>
> A firewall configuration issue may effectively lead to a cluster DDoS. The 
> scheme is as follows:
> 1) A node G joins the cluster, and a firewall rule forbids incoming 
> connection from cluster to this node
> 2) Cluster successfully processes NodeAddedMesage and fires a discovery 
> NODE_JOINED event (not sure why?)
> 4) The last node in the ring fails to connect to the newly joined node and 
> generates NODE_FAILED event
> 5) Coordinator drops the connection, joining node attempts to connect again
> The issues I see here:
> 1) Neither coordinator nor joining node print out the reason why the joining 
> node failed / did not join. A slight hint (failed to send message to the next 
> node) is printed on the node with the largest order (the one that attempted 
> to close the ring), but the root cause (connection refused) is also not 
> printed
> 2) The joining node attempts to connect to the cluster with the same node ID. 
> This violates an invariant we heavily rely on that once a node ID leaves a 
> cluster, this ID never comes back again
> 3) Each discovery event leads to a partition exchange which blocks all cache 
> operations for a time interval equal at least to the full ring latency time. 
> If several nodes are started on a malicious host, this may lead to almost 
> full cluster degradation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-5569) TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a cluster DDoS

2018-11-27 Thread Alexey Goncharuk (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700647#comment-16700647
 ] 

Alexey Goncharuk commented on IGNITE-5569:
--

Cannot merge without conflicts, pull from master is needed.

> TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a 
> cluster DDoS
> -
>
> Key: IGNITE-5569
> URL: https://issues.apache.org/jira/browse/IGNITE-5569
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 1.7
>Reporter: Alexey Goncharuk
>Assignee: Dmitry Karachentsev
>Priority: Major
> Fix For: 2.8
>
>
> A firewall configuration issue may effectively lead to a cluster DDoS. The 
> scheme is as follows:
> 1) A node G joins the cluster, and a firewall rule forbids incoming 
> connection from cluster to this node
> 2) Cluster successfully processes NodeAddedMesage and fires a discovery 
> NODE_JOINED event (not sure why?)
> 4) The last node in the ring fails to connect to the newly joined node and 
> generates NODE_FAILED event
> 5) Coordinator drops the connection, joining node attempts to connect again
> The issues I see here:
> 1) Neither coordinator nor joining node print out the reason why the joining 
> node failed / did not join. A slight hint (failed to send message to the next 
> node) is printed on the node with the largest order (the one that attempted 
> to close the ring), but the root cause (connection refused) is also not 
> printed
> 2) The joining node attempts to connect to the cluster with the same node ID. 
> This violates an invariant we heavily rely on that once a node ID leaves a 
> cluster, this ID never comes back again
> 3) Each discovery event leads to a partition exchange which blocks all cache 
> operations for a time interval equal at least to the full ring latency time. 
> If several nodes are started on a malicious host, this may lead to almost 
> full cluster degradation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-5569) TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a cluster DDoS

2018-09-21 Thread Nikolay Izhikov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623501#comment-16623501
 ] 

Nikolay Izhikov commented on IGNITE-5569:
-

[~dkarachentsev] Do we have a chance to resolve this ticket until the code 
freeze of 2.7?

> TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a 
> cluster DDoS
> -
>
> Key: IGNITE-5569
> URL: https://issues.apache.org/jira/browse/IGNITE-5569
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 1.7
>Reporter: Alexey Goncharuk
>Assignee: Dmitry Karachentsev
>Priority: Major
> Fix For: 2.7
>
>
> A firewall configuration issue may effectively lead to a cluster DDoS. The 
> scheme is as follows:
> 1) A node G joins the cluster, and a firewall rule forbids incoming 
> connection from cluster to this node
> 2) Cluster successfully processes NodeAddedMesage and fires a discovery 
> NODE_JOINED event (not sure why?)
> 4) The last node in the ring fails to connect to the newly joined node and 
> generates NODE_FAILED event
> 5) Coordinator drops the connection, joining node attempts to connect again
> The issues I see here:
> 1) Neither coordinator nor joining node print out the reason why the joining 
> node failed / did not join. A slight hint (failed to send message to the next 
> node) is printed on the node with the largest order (the one that attempted 
> to close the ring), but the root cause (connection refused) is also not 
> printed
> 2) The joining node attempts to connect to the cluster with the same node ID. 
> This violates an invariant we heavily rely on that once a node ID leaves a 
> cluster, this ID never comes back again
> 3) Each discovery event leads to a partition exchange which blocks all cache 
> operations for a time interval equal at least to the full ring latency time. 
> If several nodes are started on a malicious host, this may lead to almost 
> full cluster degradation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-5569) TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a cluster DDoS

2018-08-17 Thread Dmitry Karachentsev (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583791#comment-16583791
 ] 

Dmitry Karachentsev commented on IGNITE-5569:
-

[~dpavlov] thank you! Here is checklist status:
1.a API compatibility MUST be maintained between minor releases.
No API changes.
1.b Default behavior SHOULD NOT be changed between minor releases, unless 
absolutely needed.
Default behavior changed a bit, but it's a fix.
1.c New operation MUST be well-documented in code.
Javadoc presented.
1.d API parity between Java and .NET platforms
No changes needed.
1.e API parity between thin clients (Java, .NET)
No changes needed.
1.f All exceptions thrown to a user SHOULD have explanation how to resolve, 
workaround or debug an error.
Yes.
2. Compatibility.
Does not affect compatibility.
3. Tests
Done.
4. Codestyle.
Done.

[~yzhdanov] Please review.

> TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a 
> cluster DDoS
> -
>
> Key: IGNITE-5569
> URL: https://issues.apache.org/jira/browse/IGNITE-5569
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 1.7
>Reporter: Alexey Goncharuk
>Assignee: Dmitry Karachentsev
>Priority: Major
> Fix For: 2.7
>
>
> A firewall configuration issue may effectively lead to a cluster DDoS. The 
> scheme is as follows:
> 1) A node G joins the cluster, and a firewall rule forbids incoming 
> connection from cluster to this node
> 2) Cluster successfully processes NodeAddedMesage and fires a discovery 
> NODE_JOINED event (not sure why?)
> 4) The last node in the ring fails to connect to the newly joined node and 
> generates NODE_FAILED event
> 5) Coordinator drops the connection, joining node attempts to connect again
> The issues I see here:
> 1) Neither coordinator nor joining node print out the reason why the joining 
> node failed / did not join. A slight hint (failed to send message to the next 
> node) is printed on the node with the largest order (the one that attempted 
> to close the ring), but the root cause (connection refused) is also not 
> printed
> 2) The joining node attempts to connect to the cluster with the same node ID. 
> This violates an invariant we heavily rely on that once a node ID leaves a 
> cluster, this ID never comes back again
> 3) Each discovery event leads to a partition exchange which blocks all cache 
> operations for a time interval equal at least to the full ring latency time. 
> If several nodes are started on a malicious host, this may lead to almost 
> full cluster degradation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-5569) TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a cluster DDoS

2018-08-17 Thread Dmitriy Pavlov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583765#comment-16583765
 ] 

Dmitriy Pavlov commented on IGNITE-5569:


[~dkarachentsev] tests seem to be more or less OK, but still, I've retriggered 
failed suites.

Could you please cover review checklist items and share it as JIRA comment.

https://lists.apache.org/thread.html/3196274d0be41ebd722536542914a0d86bab9d6764d14217681dedb3@%3Cdev.ignite.apache.org%3E

See  https://cwiki.apache.org/confluence/display/IGNITE/Review+Checklist

E.g. comment can be as follows:
1.a API compatibility MUST be maintained 
No API changes
1.b Default behavior SHOULD NOT be changed
Default behavior was not changed.
etc.

Then we can ask Yakov to review.

> TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a 
> cluster DDoS
> -
>
> Key: IGNITE-5569
> URL: https://issues.apache.org/jira/browse/IGNITE-5569
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 1.7
>Reporter: Alexey Goncharuk
>Assignee: Dmitry Karachentsev
>Priority: Major
> Fix For: 2.7
>
>
> A firewall configuration issue may effectively lead to a cluster DDoS. The 
> scheme is as follows:
> 1) A node G joins the cluster, and a firewall rule forbids incoming 
> connection from cluster to this node
> 2) Cluster successfully processes NodeAddedMesage and fires a discovery 
> NODE_JOINED event (not sure why?)
> 4) The last node in the ring fails to connect to the newly joined node and 
> generates NODE_FAILED event
> 5) Coordinator drops the connection, joining node attempts to connect again
> The issues I see here:
> 1) Neither coordinator nor joining node print out the reason why the joining 
> node failed / did not join. A slight hint (failed to send message to the next 
> node) is printed on the node with the largest order (the one that attempted 
> to close the ring), but the root cause (connection refused) is also not 
> printed
> 2) The joining node attempts to connect to the cluster with the same node ID. 
> This violates an invariant we heavily rely on that once a node ID leaves a 
> cluster, this ID never comes back again
> 3) Each discovery event leads to a partition exchange which blocks all cache 
> operations for a time interval equal at least to the full ring latency time. 
> If several nodes are started on a malicious host, this may lead to almost 
> full cluster degradation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-5569) TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a cluster DDoS

2018-07-18 Thread Dmitry Karachentsev (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547743#comment-16547743
 ] 

Dmitry Karachentsev commented on IGNITE-5569:
-

[TC|https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8_IgniteTests24Java8=pull%2F2554%2Fhead]
 looks OK.

> TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a 
> cluster DDoS
> -
>
> Key: IGNITE-5569
> URL: https://issues.apache.org/jira/browse/IGNITE-5569
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 1.7
>Reporter: Alexey Goncharuk
>Assignee: Dmitry Karachentsev
>Priority: Major
> Fix For: 2.7
>
>
> A firewall configuration issue may effectively lead to a cluster DDoS. The 
> scheme is as follows:
> 1) A node G joins the cluster, and a firewall rule forbids incoming 
> connection from cluster to this node
> 2) Cluster successfully processes NodeAddedMesage and fires a discovery 
> NODE_JOINED event (not sure why?)
> 4) The last node in the ring fails to connect to the newly joined node and 
> generates NODE_FAILED event
> 5) Coordinator drops the connection, joining node attempts to connect again
> The issues I see here:
> 1) Neither coordinator nor joining node print out the reason why the joining 
> node failed / did not join. A slight hint (failed to send message to the next 
> node) is printed on the node with the largest order (the one that attempted 
> to close the ring), but the root cause (connection refused) is also not 
> printed
> 2) The joining node attempts to connect to the cluster with the same node ID. 
> This violates an invariant we heavily rely on that once a node ID leaves a 
> cluster, this ID never comes back again
> 3) Each discovery event leads to a partition exchange which blocks all cache 
> operations for a time interval equal at least to the full ring latency time. 
> If several nodes are started on a malicious host, this may lead to almost 
> full cluster degradation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-5569) TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a cluster DDoS

2018-07-17 Thread Sergey Chugunov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546714#comment-16546714
 ] 

Sergey Chugunov commented on IGNITE-5569:
-

[~dkarachentsev],

Improvement looks good to me, lets wait for TC status, and proceed with merging 
if it's OK.

> TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a 
> cluster DDoS
> -
>
> Key: IGNITE-5569
> URL: https://issues.apache.org/jira/browse/IGNITE-5569
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 1.7
>Reporter: Alexey Goncharuk
>Assignee: Dmitry Karachentsev
>Priority: Major
> Fix For: 2.7
>
>
> A firewall configuration issue may effectively lead to a cluster DDoS. The 
> scheme is as follows:
> 1) A node G joins the cluster, and a firewall rule forbids incoming 
> connection from cluster to this node
> 2) Cluster successfully processes NodeAddedMesage and fires a discovery 
> NODE_JOINED event (not sure why?)
> 4) The last node in the ring fails to connect to the newly joined node and 
> generates NODE_FAILED event
> 5) Coordinator drops the connection, joining node attempts to connect again
> The issues I see here:
> 1) Neither coordinator nor joining node print out the reason why the joining 
> node failed / did not join. A slight hint (failed to send message to the next 
> node) is printed on the node with the largest order (the one that attempted 
> to close the ring), but the root cause (connection refused) is also not 
> printed
> 2) The joining node attempts to connect to the cluster with the same node ID. 
> This violates an invariant we heavily rely on that once a node ID leaves a 
> cluster, this ID never comes back again
> 3) Each discovery event leads to a partition exchange which blocks all cache 
> operations for a time interval equal at least to the full ring latency time. 
> If several nodes are started on a malicious host, this may lead to almost 
> full cluster degradation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-5569) TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a cluster DDoS

2018-05-03 Thread Alexey Goncharuk (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462474#comment-16462474
 ] 

Alexey Goncharuk commented on IGNITE-5569:
--

[~yzhdanov] Can you please take a look at the fix? The changes look legit to me.

> TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a 
> cluster DDoS
> -
>
> Key: IGNITE-5569
> URL: https://issues.apache.org/jira/browse/IGNITE-5569
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 1.7
>Reporter: Alexey Goncharuk
>Assignee: Dmitry Karachentsev
>Priority: Major
> Fix For: 2.6
>
>
> A firewall configuration issue may effectively lead to a cluster DDoS. The 
> scheme is as follows:
> 1) A node G joins the cluster, and a firewall rule forbids incoming 
> connection from cluster to this node
> 2) Cluster successfully processes NodeAddedMesage and fires a discovery 
> NODE_JOINED event (not sure why?)
> 4) The last node in the ring fails to connect to the newly joined node and 
> generates NODE_FAILED event
> 5) Coordinator drops the connection, joining node attempts to connect again
> The issues I see here:
> 1) Neither coordinator nor joining node print out the reason why the joining 
> node failed / did not join. A slight hint (failed to send message to the next 
> node) is printed on the node with the largest order (the one that attempted 
> to close the ring), but the root cause (connection refused) is also not 
> printed
> 2) The joining node attempts to connect to the cluster with the same node ID. 
> This violates an invariant we heavily rely on that once a node ID leaves a 
> cluster, this ID never comes back again
> 3) Each discovery event leads to a partition exchange which blocks all cache 
> operations for a time interval equal at least to the full ring latency time. 
> If several nodes are started on a malicious host, this may lead to almost 
> full cluster degradation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-5569) TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a cluster DDoS

2017-09-28 Thread Vladimir Ozerov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184131#comment-16184131
 ] 

Vladimir Ozerov commented on IGNITE-5569:
-

Moved to 2.4 due to inactivity.

> TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a 
> cluster DDoS
> -
>
> Key: IGNITE-5569
> URL: https://issues.apache.org/jira/browse/IGNITE-5569
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 1.7
>Reporter: Alexey Goncharuk
>Assignee: Dmitry Karachentsev
> Fix For: 2.4
>
>
> A firewall configuration issue may effectively lead to a cluster DDoS. The 
> scheme is as follows:
> 1) A node G joins the cluster, and a firewall rule forbids incoming 
> connection from cluster to this node
> 2) Cluster successfully processes NodeAddedMesage and fires a discovery 
> NODE_JOINED event (not sure why?)
> 4) The last node in the ring fails to connect to the newly joined node and 
> generates NODE_FAILED event
> 5) Coordinator drops the connection, joining node attempts to connect again
> The issues I see here:
> 1) Neither coordinator nor joining node print out the reason why the joining 
> node failed / did not join. A slight hint (failed to send message to the next 
> node) is printed on the node with the largest order (the one that attempted 
> to close the ring), but the root cause (connection refused) is also not 
> printed
> 2) The joining node attempts to connect to the cluster with the same node ID. 
> This violates an invariant we heavily rely on that once a node ID leaves a 
> cluster, this ID never comes back again
> 3) Each discovery event leads to a partition exchange which blocks all cache 
> operations for a time interval equal at least to the full ring latency time. 
> If several nodes are started on a malicious host, this may lead to almost 
> full cluster degradation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)