[jira] [Commented] (CASSANDRA-15318) sendMessagesToNonlocalDC() should shuffle targets
[ https://issues.apache.org/jira/browse/CASSANDRA-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17426393#comment-17426393 ] Jon Meredith commented on CASSANDRA-15318: -- The original motivation for the patch was to mitigate the impact of an outage for write queries with CL.EACH_QUORUM where the first node in the forwarding list was considered UP but was in a state where it did not forward traffic to the others, so by shuffling with equal probability means 2/3 traffic gets through instead of failing. Sorting or weighting by proximity might help performance but probably negate some of the resilience this was intended to help with. > sendMessagesToNonlocalDC() should shuffle targets > - > > Key: CASSANDRA-15318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15318 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Jon Meredith >Assignee: Jon Meredith >Priority: Normal > Fix For: 4.0-alpha3, 4.0 > > > To better spread load and reduce the impact of a node failure before > detection (or other issues like issues host replacement), when forwarding > messages to other data centers the forwarding non-local dc nodes should be > selected at random rather than always selecting the first node in the list of > endpoints for a token. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15318) sendMessagesToNonlocalDC() should shuffle targets
[ https://issues.apache.org/jira/browse/CASSANDRA-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17426367#comment-17426367 ] Paulo Motta commented on CASSANDRA-15318: - [~jmeredithco] Hi Jon, just came across this issue and was wondering if you think it would be worth sorting nodes by proximity when dynamic snitch is enabled to avoid stragglers when picking non-local forwarders? > sendMessagesToNonlocalDC() should shuffle targets > - > > Key: CASSANDRA-15318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15318 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Jon Meredith >Assignee: Jon Meredith >Priority: Normal > Fix For: 4.0-alpha3, 4.0 > > > To better spread load and reduce the impact of a node failure before > detection (or other issues like issues host replacement), when forwarding > messages to other data centers the forwarding non-local dc nodes should be > selected at random rather than always selecting the first node in the list of > endpoints for a token. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15318) sendMessagesToNonlocalDC() should shuffle targets
[ https://issues.apache.org/jira/browse/CASSANDRA-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976809#comment-16976809 ] Dinesh Joshi commented on CASSANDRA-15318: -- +1 but before merging lets insure that the test failures are unrelated. > sendMessagesToNonlocalDC() should shuffle targets > - > > Key: CASSANDRA-15318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15318 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Jon Meredith >Assignee: Jon Meredith >Priority: Normal > > To better spread load and reduce the impact of a node failure before > detection (or other issues like issues host replacement), when forwarding > messages to other data centers the forwarding non-local dc nodes should be > selected at random rather than always selecting the first node in the list of > endpoints for a token. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15318) sendMessagesToNonlocalDC() should shuffle targets
[ https://issues.apache.org/jira/browse/CASSANDRA-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976802#comment-16976802 ] Jon Meredith commented on CASSANDRA-15318: -- Rebased and rerunning to double-check some (thought to be) unrelated unit test failures. [CircleCI|https://circleci.com/workflow-run/c6247670-e965-4260-9632-5bd3deb9ad06] > sendMessagesToNonlocalDC() should shuffle targets > - > > Key: CASSANDRA-15318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15318 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Jon Meredith >Assignee: Jon Meredith >Priority: Normal > > To better spread load and reduce the impact of a node failure before > detection (or other issues like issues host replacement), when forwarding > messages to other data centers the forwarding non-local dc nodes should be > selected at random rather than always selecting the first node in the list of > endpoints for a token. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15318) sendMessagesToNonlocalDC() should shuffle targets
[ https://issues.apache.org/jira/browse/CASSANDRA-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947206#comment-16947206 ] Jon Meredith commented on CASSANDRA-15318: -- Created a new branch against trunk now CASSANDRA-15319 is merged. [Branch|https://github.com/jonmeredith/cassandra/tree/CASSANDRA-15318] [GItHub PR|https://github.com/apache/cassandra/pull/363] [CircleCI|https://circleci.com/workflow-run/cbd7e18d-e452-4a8c-aa84-29f1285d38ba] > sendMessagesToNonlocalDC() should shuffle targets > - > > Key: CASSANDRA-15318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15318 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Jon Meredith >Assignee: Jon Meredith >Priority: Normal > > To better spread load and reduce the impact of a node failure before > detection (or other issues like issues host replacement), when forwarding > messages to other data centers the forwarding non-local dc nodes should be > selected at random rather than always selecting the first node in the list of > endpoints for a token. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15318) sendMessagesToNonlocalDC() should shuffle targets
[ https://issues.apache.org/jira/browse/CASSANDRA-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925876#comment-16925876 ] Jon Meredith commented on CASSANDRA-15318: -- Linked is a patch to shuffle the node used for forwarding messages to remote DCs. The MessageForwardingTest has been updated to verify that the remote node is picked as the forwarder at least once (it doesn't check fairness, I didn't want to risk any flappy tests due to randomness). trunk [changes | https://github.com/jonmeredith/cassandra/pull/1 ] [trunk CircleCI | https://circleci.com/workflow-run/c418-f747-4665-be7a-7a97ffcc] (The PR is against my local CASSANDRA-15319 branch to make the diff clear, can retarget/reopen as needed) > sendMessagesToNonlocalDC() should shuffle targets > - > > Key: CASSANDRA-15318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15318 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Jon Meredith >Assignee: Jon Meredith >Priority: Normal > > To better spread load and reduce the impact of a node failure before > detection (or other issues like issues host replacement), when forwarding > messages to other data centers the forwarding non-local dc nodes should be > selected at random rather than always selecting the first node in the list of > endpoints for a token. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org