[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather than in flight schema requests when bootstrapping

2020-11-13 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231782#comment-17231782
 ] 

David Capwell edited comment on CASSANDRA-15158 at 11/13/20, 8:38 PM:
--

Starting commit

CI Results: Yellow.  3.1 org.apache.cassandra.service.MigrationCoordinatorTest 
but passes locally, -trunk 
org.apache.cassandra.distributed.test.ring.BootstrapTest fails frequently due 
to schemas not present added commit which increases timeout from 30s to 90s-, 
and other expected issues.
||Branch||Source||Circle CI||Jenkins||
|cassandra-3.0|[branch|https://github.com/dcapwell/cassandra/tree/commit_remote_branch/CASSANDRA-15158-cassandra-3.0-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-15158-cassandra-3.0-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/200/]|
|cassandra-3.11|[branch|https://github.com/dcapwell/cassandra/tree/commit_remote_branch/CASSANDRA-15158-cassandra-3.11-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-15158-cassandra-3.11-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/201/]|
|trunk|[branch|https://github.com/dcapwell/cassandra/tree/commit_remote_branch/CASSANDRA-15158-trunk-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-15158-trunk-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/202/]|



was (Author: dcapwell):
Starting commit

CI Results: Yellow.  3.1 org.apache.cassandra.service.MigrationCoordinatorTest 
but passes locally, trunk 
org.apache.cassandra.distributed.test.ring.BootstrapTest fails frequently due 
to schemas not present added commit which increases timeout from 30s to 90s, 
and other expected issues.
||Branch||Source||Circle CI||Jenkins||
|cassandra-3.0|[branch|https://github.com/dcapwell/cassandra/tree/commit_remote_branch/CASSANDRA-15158-cassandra-3.0-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-15158-cassandra-3.0-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/200/]|
|cassandra-3.11|[branch|https://github.com/dcapwell/cassandra/tree/commit_remote_branch/CASSANDRA-15158-cassandra-3.11-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-15158-cassandra-3.11-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/201/]|
|trunk|[branch|https://github.com/dcapwell/cassandra/tree/commit_remote_branch/CASSANDRA-15158-trunk-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-15158-trunk-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/202/]|


> Wait for schema agreement rather than in flight schema requests when 
> bootstrapping
> --
>
> Key: CASSANDRA-15158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Schema
>Reporter: Vincent White
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently when a node is bootstrapping we use a set of latches 
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
> in-flight schema pull requests, and we don't proceed with 
> bootstrapping/stream until all the latches are released (or we timeout 
> waiting for each one). One issue with this is that if we have a large schema, 
> or the retrieval of the schema from the other nodes was unexpectedly slow 
> then we have no explicit check in place to ensure we have actually received a 
> schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the 
> node to wait on each latche longer, there are cases where this doesn't help 
> because the callbacks for the schema pull requests have expired off the 
> messaging service's callback map 
> (org.apache.cassandra.net.MessagingService#callbacks) after 
> request_timeout_in_ms (default 10 seconds) before the other nodes were able 
> to respond to the new node.
> This patch checks for schema agreement between the 

[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather than in flight schema requests when bootstrapping

2020-11-13 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231782#comment-17231782
 ] 

David Capwell edited comment on CASSANDRA-15158 at 11/13/20, 8:20 PM:
--

Starting commit

CI Results: Yellow.  3.1 org.apache.cassandra.service.MigrationCoordinatorTest 
but passes locally, trunk 
org.apache.cassandra.distributed.test.ring.BootstrapTest fails frequently due 
to schemas not present added commit which increases timeout from 30s to 90s, 
and other expected issues.
||Branch||Source||Circle CI||Jenkins||
|cassandra-3.0|[branch|https://github.com/dcapwell/cassandra/tree/commit_remote_branch/CASSANDRA-15158-cassandra-3.0-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-15158-cassandra-3.0-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/200/]|
|cassandra-3.11|[branch|https://github.com/dcapwell/cassandra/tree/commit_remote_branch/CASSANDRA-15158-cassandra-3.11-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-15158-cassandra-3.11-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/201/]|
|trunk|[branch|https://github.com/dcapwell/cassandra/tree/commit_remote_branch/CASSANDRA-15158-trunk-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-15158-trunk-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/202/]|



was (Author: dcapwell):
Starting commit

CI Results (pending):
||Branch||Source||Circle CI||Jenkins||
|cassandra-3.0|[branch|https://github.com/dcapwell/cassandra/tree/commit_remote_branch/CASSANDRA-15158-cassandra-3.0-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-15158-cassandra-3.0-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/200/]|
|cassandra-3.11|[branch|https://github.com/dcapwell/cassandra/tree/commit_remote_branch/CASSANDRA-15158-cassandra-3.11-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-15158-cassandra-3.11-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/201/]|
|trunk|[branch|https://github.com/dcapwell/cassandra/tree/commit_remote_branch/CASSANDRA-15158-trunk-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-15158-trunk-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/202/]|


> Wait for schema agreement rather than in flight schema requests when 
> bootstrapping
> --
>
> Key: CASSANDRA-15158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Schema
>Reporter: Vincent White
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently when a node is bootstrapping we use a set of latches 
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
> in-flight schema pull requests, and we don't proceed with 
> bootstrapping/stream until all the latches are released (or we timeout 
> waiting for each one). One issue with this is that if we have a large schema, 
> or the retrieval of the schema from the other nodes was unexpectedly slow 
> then we have no explicit check in place to ensure we have actually received a 
> schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the 
> node to wait on each latche longer, there are cases where this doesn't help 
> because the callbacks for the schema pull requests have expired off the 
> messaging service's callback map 
> (org.apache.cassandra.net.MessagingService#callbacks) after 
> request_timeout_in_ms (default 10 seconds) before the other nodes were able 
> to respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the 
> rest of the live nodes before proceeding with bootstrapping. It also adds a 
> check to prevent the new node from flooding existing nodes with simultaneous 
> schema pull requests as can happen in large clusters.
> Removing the latch system 

[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather than in flight schema requests when bootstrapping

2020-10-30 Thread Aleksey Yeschenko (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216816#comment-17216816
 ] 

Aleksey Yeschenko edited comment on CASSANDRA-15158 at 10/30/20, 4:07 PM:
--

Left a small comment on the 3.0 branch. Also, the following nits for 
{{MigrationCoordinator}}:
1. A bunch of unused imports
2. {{shouldApplySchemaFrom()}} has an unused argument
3. {{requestQueue}} could be an {{ArrayDequeue}} instead of a {{LinkedList}} - 
should set a good example for anyone randomly reading this code, even if it's 
not critical to do the right thing in this context 

EDIT: LGTM, +1, ship it


was (Author: iamaleksey):
Left a small comment on the 3.0 branch. Also, the following nits for 
{{MigrationCoordinator}}:
1. A bunch of unused imports
2. {{shouldApplySchemaFrom()}} has an unused argument
3. {{requestQueue}} could be an {{ArrayDequeue}} instead of a {{LinkedList}} - 
should set a good example for anyone randomly reading this code, even if it's 
not critical to do the right thing in this context 

> Wait for schema agreement rather than in flight schema requests when 
> bootstrapping
> --
>
> Key: CASSANDRA-15158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Schema
>Reporter: Vincent White
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently when a node is bootstrapping we use a set of latches 
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
> in-flight schema pull requests, and we don't proceed with 
> bootstrapping/stream until all the latches are released (or we timeout 
> waiting for each one). One issue with this is that if we have a large schema, 
> or the retrieval of the schema from the other nodes was unexpectedly slow 
> then we have no explicit check in place to ensure we have actually received a 
> schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the 
> node to wait on each latche longer, there are cases where this doesn't help 
> because the callbacks for the schema pull requests have expired off the 
> messaging service's callback map 
> (org.apache.cassandra.net.MessagingService#callbacks) after 
> request_timeout_in_ms (default 10 seconds) before the other nodes were able 
> to respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the 
> rest of the live nodes before proceeding with bootstrapping. It also adds a 
> check to prevent the new node from flooding existing nodes with simultaneous 
> schema pull requests as can happen in large clusters.
> Removing the latch system should also prevent new nodes in large clusters 
> getting stuck for extended amounts of time as they wait 
> `migration_task_wait_in_seconds` on each of the latches left orphaned by the 
> timed out callbacks.
>  
> ||3.11||
> |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
> |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather than in flight schema requests when bootstrapping

2020-09-09 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193009#comment-17193009
 ] 

Stefan Miklosovic edited comment on CASSANDRA-15158 at 9/9/20, 4:50 PM:


I have improved the original work of mine and I wrote a test for that. Jenkins 
build does not fail anymore so I believe I have totally on par solution when it 
comes to dtests as I do not have time to fix dtests which the other solution 
breaks. While I admit that the improved version is technicaly more superior, 
the necessity to have clean build and same behaviour when it comes to dtests is 
more important to me at this moment. It would be awesome if dtests and issues I 
spotted are resolved though.

The test is here (1), the main logic is that a cluster of two nodes is started, 
the third node is started afterwards and I am dropping all migration messages 
to the other two, simulating some communication error between them. After some 
time, migration messages starts to flow again. So by doing this, I ll test the 
internals of the logic I wrote and it seems to do its job.

One issue I am little bit concerned of is that StorageService is issuing schema 
migration requests on "onAlive, onJoin ..." in StorageService and these 
requests are not part of the waitForSchema() logic. It is understandable that 
it is like that as we need to track migration requests after a node fully 
bootstraps but we should skip this from happening when a node is under 
bootstrapping. I wrapped the bodies of these methods into "if (hasJoined())" 
but it was invoked anyway. However, it does not matter too much if this is 
outside of the logic I did because if schema migration was sucessful, the 
rewritten logic in waitForSchema does not have anything to deal with so we are 
done anyway. For skipping this in test, I used ByteBuddy to intercept 
MigrationManager#scheduleSchemaPull to do nothing hence I effectively skip 
migration schemas to be sent outside of the change I did.

onChange method in onJoin merges schemas again too, in case state is SCHEMA so 
I am not completely sure why we are merging schemas on a join anyway?

{code:java}
public void onJoin(InetAddress endpoint, EndpointState epState)
{
for (Map.Entry entry : 
epState.states())
{
onChange(endpoint, entry.getKey(), entry.getValue());
}

// this is weird
MigrationManager.instance.scheduleSchemaPull(endpoint, epState);
}

public void onAlive(InetAddress endpoint, EndpointState state)
{
// this is weird as well
MigrationManager.instance.scheduleSchemaPull(endpoint, state);
if (tokenMetadata.isMember(endpoint))
notifyUp(endpoint);
}
{code}


(1) 
https://github.com/instaclustr/cassandra/blob/15158-original-fix/test/distributed/org/apache/cassandra/distributed/test/BootstrappingSchemaAgreementTest.java




was (Author: stefan.miklosovic):
I have improved the original work of mine and I wrote a test for that. Jenkins 
build does not fail anymore so I believe I have totally on par solution when it 
comes to dtests as I do not have time to fix dtests which the other solution 
breaks. While I admit that the improved version is technicaly more superior, 
the necessity to have clean build and same behaviour when it comes to dtests is 
more important to me at this moment. It would be awesome if dtests and issues I 
spotted are resolved though.

The test is here (1), the main logic is that a cluster of two nodes is started, 
the third node is started afterwards and I am dropping all migration messages 
to the other two, simulating some communication error between them. After some 
time, migration messages starts to flow again. So by doing this, I ll test the 
internals of the logic I wrote and it seems to do its job.

One issue I am little bit concerned of is that StorageService is issuing schema 
migration requests on "onAlive, onJoin ..." in StorageService and these 
requests are not part of the waitForSchema() logic. It is understandable that 
it is like that as we need to track migration requests after a node fully 
bootstraps but we should skip this from happening when a node is under 
bootstrapping. I wrapped the bodies of these methods into "if (hasJoined())" 
but it was invoked anyway. However, it does not matter too much if this is 
outside of the logic I did because if schema migration was sucessful, the 
rewritten logic in waitForSchema does not have anything to deal with so we are 
done anyway. For skipping this in test, I used ByteBuddy to intercept 
MigrationManager#scheduleSchemaPull to do nothing hence I effectively skip 
migration schemas to be sent outside of the change I did.

onChange method in onJoin merges schemas again too, in case state is SCHEMA so 
I am not completely sure why we are merging schemas on a join anyway?

{code:java}

[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather than in flight schema requests when bootstrapping

2020-09-09 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193009#comment-17193009
 ] 

Stefan Miklosovic edited comment on CASSANDRA-15158 at 9/9/20, 4:47 PM:


I have improved the original work of mine and I wrote a test for that. Jenkins 
build does not fail anymore so I believe I have totally on par solution when it 
comes to dtests as I do not have time to fix dtests which the other solution 
breaks. While I admit that the improved version is technicaly more superior, 
the necessity to have clean build and same behaviour when it comes to dtests is 
more important to me at this moment. It would be awesome if dtests and issues I 
spotted are resolved though.

The test is here (1), the main logic is that a cluster of two nodes is started, 
the third node is started afterwards and I am dropping all migration messages 
to the other two, simulating some communication error between them. After some 
time, migration messages starts to flow again. So by doing this, I ll test the 
internals of the logic I wrote and it seems to do its job.

One issue I am little bit concerned of is that StorageService is issuing schema 
migration requests on "onAlive, onJoin ..." in StorageService and these 
requests are not part of the waitForSchema() logic. It is understandable that 
it is like that as we need to track migration requests after a node fully 
bootstraps but we should skip this from happening when a node is under 
bootstrapping. I wrapped the bodies of these methods into "if (hasJoined())" 
but it was invoked anyway. However, it does not matter too much if this is 
outside of the logic I did because if schema migration was sucessful, the 
rewritten logic in waitForSchema does not have anything to deal with so we are 
done anyway. For skipping this in test, I used ByteBuddy to intercept 
MigrationManager#scheduleSchemaPull to do nothing hence I effectively skip 
migration schemas to be sent outside of the change I did.

onChange method in onJoin merges schemas again too, in case state is SCHEMA so 
I am not completely sure why we are merging schemas on a join anyway?

{code:java}
public void onJoin(InetAddress endpoint, EndpointState epState)
{
for (Map.Entry entry : 
epState.states())
{
onChange(endpoint, entry.getKey(), entry.getValue());
}

// this is weird
MigrationManager.instance.scheduleSchemaPull(endpoint, epState);
}

public void onAlive(InetAddress endpoint, EndpointState state)
{
// this is weird as well
MigrationManager.instance.scheduleSchemaPull(endpoint, state);
if (tokenMetadata.isMember(endpoint))
notifyUp(endpoint);
}
{code}


(1) 
https://github.com/instaclustr/cassandra/blob/0ceb1d6edb55916e68ae436e99c932e5ce28f68a/test/distributed/org/apache/cassandra/distributed/test/BootstrappingSchemaAgreementTest.java




was (Author: stefan.miklosovic):
I have improved the original work of mine and I wrote a test for that. Jenkins 
build does not fail anymore so I believe I have totally on par solution when it 
comes to dtests as I do not have time to fix dtests which the other solution 
breaks. While I admit that the improved version is technicaly more superior, 
the necessity to have clean build and same behaviour when it comes to dtests is 
more important to me at this moment. It would be awesome if dtests and issues I 
spotted are resolved though.

The test is here (1), the main logic is that a cluster of two nodes is started, 
the third node is started afterwards and I am dropping all migration messages 
to the other two, simulating some communication error between them. After some 
time, migration messages starts to flow again. So by doing this, I ll test the 
internals of the logic I wrote and it seems to do its job.

One issue I am little bit concerned of is that StorageService is issuing schema 
migration requests on "onAlive, onJoin ..." in StorageService and these 
requests are not part of the waitForSchema() logic. It is understandable that 
it is like that as we need to track migration requests after a node fully 
bootstraps but we should skip this from happening when a node is under 
bootstrapping. I wrapped the bodies of these methods into "if (hasJoined())" 
but it was invoked anyway. However, it does not matter too much if this is 
outside of the logic I did because if schema migration was sucessful, the 
rewritten logic in waitForSchema does not have anything to deal with so we are 
done anyway. For skipping this in test, I used ByteBuddy to intercept 
MigrationManager#scheduleSchemaPull to do nothing hence I effectively skip 
migration schemas to be sent outside of the change I did.

(1) 

[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather than in flight schema requests when bootstrapping

2020-09-09 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193009#comment-17193009
 ] 

Stefan Miklosovic edited comment on CASSANDRA-15158 at 9/9/20, 4:19 PM:


I have improved the original work of mine and I wrote a test for that. Jenkins 
build does not fail anymore so I believe I have totally on par solution when it 
comes to dtests as I do not have time to fix dtests which the other solution 
breaks. While I admit that the improved version is technicaly more superior, 
the necessity to have clean build and same behaviour when it comes to dtests is 
more important to me at this moment. It would be awesome if dtests and issues I 
spotted are resolved though.

The test is here (1), the main logic is that a cluster of two nodes is started, 
the third node is started afterwards and I am dropping all migration messages 
to the other two, simulating some communication error between them. After some 
time, migration messages starts to flow again. So by doing this, I ll test the 
internals of the logic I wrote and it seems to do its job.

One issue I am little bit concerned of is that StorageService is issuing schema 
migration requests on "onAlive, onJoin ..." in StorageService and these 
requests are not part of the waitForSchema() logic. It is understandable that 
it is like that as we need to track migration requests after a node fully 
bootstraps but we should skip this from happening when a node is under 
bootstrapping. I wrapped the bodies of these methods into "if (hasJoined())" 
but it was invoked anyway. However, it does not matter too much if this is 
outside of the logic I did because if schema migration was sucessful, the 
rewritten logic in waitForSchema does not have anything to deal with so we are 
done anyway. For skipping this in test, I used ByteBuddy to intercept 
MigrationManager#scheduleSchemaPull to do nothing hence I effectively skip 
migration schemas to be sent outside of the change I did.

(1) 
https://github.com/instaclustr/cassandra/blob/0ceb1d6edb55916e68ae436e99c932e5ce28f68a/test/distributed/org/apache/cassandra/distributed/test/BootstrappingSchemaAgreementTest.java




was (Author: stefan.miklosovic):
I have improved the original work of mine and I wrote a test for that. Jenkins 
build does not fail anymore so I believe I have totally on par solution when it 
comes to dtests as I do not have time to fix dtests which the other solution 
breaks. While I admit that the improved version is technicaly more superior, 
the necessity to have clean build and same behaviour when it comes to dtests is 
more important to me at this moment. It would be awesome if dtests and issues I 
spotted are resolved thought.

The test is here (1), the main logic is that a cluster of two nodes is started, 
the third node is started afterwards and I am dropping all migration messages 
to the other two, simulating some communication error between them. After some 
time, migration messages starts to flow again. So by doing this, I ll test the 
internals of the logic I wrote and it seems to do its job.

One issue I am little bit concerned of is that StorageService is issuing schema 
migration requests on "onAlive, onJoin ..." in StorageService and these 
requests are not part of the waitForSchema() logic. It is understandable that 
it is like that as we need to track migration requests after a node fully 
bootstraps but we should skip this from happening when a node is under 
bootstrapping. I wrapped the bodies of these methods into "if (hasJoined())" 
but it was invoked anyway. However, it does not matter too much if this is 
outside of the logic I did because if schema migration was sucessful, the 
rewritten logic in waitForSchema does not have anything to deal with so we are 
done anyway. For skipping this in test, I used ByteBuddy to intercept 
MigrationManager#scheduleSchemaPull to do nothing hence I effectively skip 
migration schemas to be sent outside of the change I did.

(1) 
https://github.com/instaclustr/cassandra/blob/0ceb1d6edb55916e68ae436e99c932e5ce28f68a/test/distributed/org/apache/cassandra/distributed/test/BootstrappingSchemaAgreementTest.java



> Wait for schema agreement rather than in flight schema requests when 
> bootstrapping
> --
>
> Key: CASSANDRA-15158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Schema
>Reporter: Vincent White
>Assignee: Blake Eggleston
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently when a node is bootstrapping we use a set of latches 
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 

[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather than in flight schema requests when bootstrapping

2020-09-07 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191705#comment-17191705
 ] 

Stefan Miklosovic edited comment on CASSANDRA-15158 at 9/7/20, 1:49 PM:


I am getting this exception on totally clean node, I am bootstrapping a cluster 
of 3 nodes:


{code:java}
cassandra_node_1| INFO  [ScheduledTasks:1] 2020-09-07 15:10:13,037 
TokenMetadata.java:517 - Updating topology for all endpoints that have changed
cassandra_node_1| INFO  [HANDSHAKE-spark-master-1/172.19.0.5] 2020-09-07 
15:10:13,311 OutboundTcpConnection.java:561 - Handshaking version with 
spark-master-1/172.19.0.5
cassandra_node_1| INFO  [GossipStage:1] 2020-09-07 15:10:13,870 
Gossiper.java:1141 - Node /172.19.0.5 is now part of the cluster
cassandra_node_1| INFO  [GossipStage:1] 2020-09-07 15:10:13,904 
TokenMetadata.java:497 - Updating topology for /172.19.0.5
cassandra_node_1| INFO  [GossipStage:1] 2020-09-07 15:10:13,907 
TokenMetadata.java:497 - Updating topology for /172.19.0.5
cassandra_node_1| INFO  [GossipStage:1] 2020-09-07 15:10:14,052 
Gossiper.java:1103 - InetAddress /172.19.0.5 is now UP
cassandra_node_1| WARN  [MessagingService-Incoming-/172.19.0.5] 2020-09-07 
15:10:14,119 IncomingTcpConnection.java:103 - UnknownColumnFamilyException 
reading from socket; closing
cassandra_node_1| org.apache.cassandra.db.UnknownColumnFamilyException: 
Couldn't find table for cfId 5bc52802-de25-35ed-aeab-188eecebb090. If a table 
was just created, this is likely due to the schema not being fully propagated.  
Please wait for schema agreement on table creation.
cassandra_node_1|   at 
org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1578)
 ~[apache-cassandra-3.11.9-SNAPSHOT.jar:3.11.9-SNAPSHOT]
cassandra_node_1|   at 
org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:899)
 ~[apache-cassandra-3.11.9-SNAPSHOT.jar:3.11.9-SNAPSHOT]
cassandra_node_1|   at 
org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:874)
 ~[apache-cassandra-3.11.9-SNAPSHOT.jar:3.11.9-SNAPSHOT]
cassandra_node_1|   at 
org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:415)
 ~[apache-cassandra-3.11.9-SNAPSHOT.jar:3.11.9-SNAPSHOT]
cassandra_node_1|   at 
org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:434)
 ~[apache-cassandra-3.11.9-SNAPSHOT.jar:3.11.9-SNAPSHOT]
cassandra_node_1|   at 
org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:371)
 ~[apache-cassandra-3.11.9-SNAPSHOT.jar:3.11.9-SNAPSHOT]
cassandra_node_1|   at 
org.apache.cassandra.net.MessageIn.read(MessageIn.java:123) 
~[apache-cassandra-3.11.9-SNAPSHOT.jar:3.11.9-SNAPSHOT]
cassandra_node_1|   at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195)
 ~[apache-cassandra-3.11.9-SNAPSHOT.jar:3.11.9-SNAPSHOT]
cassandra_node_1|   at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:183)
 ~[apache-cassandra-3.11.9-SNAPSHOT.jar:3.11.9-SNAPSHOT]

{code}

That cfId stands for system_auth/roles. It seems like we are applying changes 
before schema agreement has occured so that table is not there yet to apply 
mutations against.

This is the log from the second node. The first one booted fine, the second one 
throws this, the third one boots fine. It seems like eventually everything is 
just fine however that exception is ... concerning.



was (Author: stefan.miklosovic):
I am getting this exception on totally clean node, I am bootstrapping a cluster 
of 3 nodes:


{code:java}
cassandra_node_1| INFO  [ScheduledTasks:1] 2020-09-07 15:10:13,037 
TokenMetadata.java:517 - Updating topology for all endpoints that have changed
cassandra_node_1| INFO  [HANDSHAKE-spark-master-1/172.19.0.5] 2020-09-07 
15:10:13,311 OutboundTcpConnection.java:561 - Handshaking version with 
spark-master-1/172.19.0.5
cassandra_node_1| INFO  [GossipStage:1] 2020-09-07 15:10:13,870 
Gossiper.java:1141 - Node /172.19.0.5 is now part of the cluster
cassandra_node_1| INFO  [GossipStage:1] 2020-09-07 15:10:13,904 
TokenMetadata.java:497 - Updating topology for /172.19.0.5
cassandra_node_1| INFO  [GossipStage:1] 2020-09-07 15:10:13,907 
TokenMetadata.java:497 - Updating topology for /172.19.0.5
cassandra_node_1| INFO  [GossipStage:1] 2020-09-07 15:10:14,052 
Gossiper.java:1103 - InetAddress /172.19.0.5 is now UP
cassandra_node_1| WARN  [MessagingService-Incoming-/172.19.0.5] 2020-09-07 
15:10:14,119 IncomingTcpConnection.java:103 - UnknownColumnFamilyException 
reading from socket; closing
cassandra_node_1| org.apache.cassandra.db.UnknownColumnFamilyException: 
Couldn't find table for cfId 

[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather than in flight schema requests when bootstrapping

2020-09-07 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191711#comment-17191711
 ] 

Stefan Miklosovic edited comment on CASSANDRA-15158 at 9/7/20, 1:42 PM:


There is also a runtime error as that concurrent hash map from that package is 
not on the class path. I removed it here, I just squashed all changes in Blakes 
branch + this one fix:

https://github.com/instaclustr/cassandra/commit/e23677deeb7c836b4b7c80f98009353668351620


was (Author: stefan.miklosovic):
There is also a runtime error as that concurrent hash map from that package is 
not on the class path. I removed it here, I just squashed all changes in Blakes 
branch + this one fix:

https://github.com/instaclustr/cassandra/commit/af82bc2f1a4f9eff09458101c63027e919873af9

> Wait for schema agreement rather than in flight schema requests when 
> bootstrapping
> --
>
> Key: CASSANDRA-15158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Schema
>Reporter: Vincent White
>Assignee: Blake Eggleston
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently when a node is bootstrapping we use a set of latches 
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
> in-flight schema pull requests, and we don't proceed with 
> bootstrapping/stream until all the latches are released (or we timeout 
> waiting for each one). One issue with this is that if we have a large schema, 
> or the retrieval of the schema from the other nodes was unexpectedly slow 
> then we have no explicit check in place to ensure we have actually received a 
> schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the 
> node to wait on each latche longer, there are cases where this doesn't help 
> because the callbacks for the schema pull requests have expired off the 
> messaging service's callback map 
> (org.apache.cassandra.net.MessagingService#callbacks) after 
> request_timeout_in_ms (default 10 seconds) before the other nodes were able 
> to respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the 
> rest of the live nodes before proceeding with bootstrapping. It also adds a 
> check to prevent the new node from flooding existing nodes with simultaneous 
> schema pull requests as can happen in large clusters.
> Removing the latch system should also prevent new nodes in large clusters 
> getting stuck for extended amounts of time as they wait 
> `migration_task_wait_in_seconds` on each of the latches left orphaned by the 
> timed out callbacks.
>  
> ||3.11||
> |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
> |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather than in flight schema requests when bootstrapping

2020-09-07 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191711#comment-17191711
 ] 

Stefan Miklosovic edited comment on CASSANDRA-15158 at 9/7/20, 1:37 PM:


There is also a runtime error as that concurrent hash map from that package is 
not on the class path. I removed it here, I just squashed all changes in Blakes 
branch + this one fix:

https://github.com/instaclustr/cassandra/commit/af82bc2f1a4f9eff09458101c63027e919873af9


was (Author: stefan.miklosovic):
There is also a runtime error as that concurrent hash map from that package is 
not a class path. I removed it here, I just squashed all changes in Blakes 
branch + this one fix:

https://github.com/instaclustr/cassandra/commit/af82bc2f1a4f9eff09458101c63027e919873af9

> Wait for schema agreement rather than in flight schema requests when 
> bootstrapping
> --
>
> Key: CASSANDRA-15158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Schema
>Reporter: Vincent White
>Assignee: Blake Eggleston
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently when a node is bootstrapping we use a set of latches 
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
> in-flight schema pull requests, and we don't proceed with 
> bootstrapping/stream until all the latches are released (or we timeout 
> waiting for each one). One issue with this is that if we have a large schema, 
> or the retrieval of the schema from the other nodes was unexpectedly slow 
> then we have no explicit check in place to ensure we have actually received a 
> schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the 
> node to wait on each latche longer, there are cases where this doesn't help 
> because the callbacks for the schema pull requests have expired off the 
> messaging service's callback map 
> (org.apache.cassandra.net.MessagingService#callbacks) after 
> request_timeout_in_ms (default 10 seconds) before the other nodes were able 
> to respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the 
> rest of the live nodes before proceeding with bootstrapping. It also adds a 
> check to prevent the new node from flooding existing nodes with simultaneous 
> schema pull requests as can happen in large clusters.
> Removing the latch system should also prevent new nodes in large clusters 
> getting stuck for extended amounts of time as they wait 
> `migration_task_wait_in_seconds` on each of the latches left orphaned by the 
> timed out callbacks.
>  
> ||3.11||
> |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
> |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather than in flight schema requests when bootstrapping

2020-09-07 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191705#comment-17191705
 ] 

Stefan Miklosovic edited comment on CASSANDRA-15158 at 9/7/20, 1:18 PM:


I am getting this exception on totally clean node, I am bootstrapping a cluster 
of 3 nodes:


{code:java}
cassandra_node_1| INFO  [ScheduledTasks:1] 2020-09-07 15:10:13,037 
TokenMetadata.java:517 - Updating topology for all endpoints that have changed
cassandra_node_1| INFO  [HANDSHAKE-spark-master-1/172.19.0.5] 2020-09-07 
15:10:13,311 OutboundTcpConnection.java:561 - Handshaking version with 
spark-master-1/172.19.0.5
cassandra_node_1| INFO  [GossipStage:1] 2020-09-07 15:10:13,870 
Gossiper.java:1141 - Node /172.19.0.5 is now part of the cluster
cassandra_node_1| INFO  [GossipStage:1] 2020-09-07 15:10:13,904 
TokenMetadata.java:497 - Updating topology for /172.19.0.5
cassandra_node_1| INFO  [GossipStage:1] 2020-09-07 15:10:13,907 
TokenMetadata.java:497 - Updating topology for /172.19.0.5
cassandra_node_1| INFO  [GossipStage:1] 2020-09-07 15:10:14,052 
Gossiper.java:1103 - InetAddress /172.19.0.5 is now UP
cassandra_node_1| WARN  [MessagingService-Incoming-/172.19.0.5] 2020-09-07 
15:10:14,119 IncomingTcpConnection.java:103 - UnknownColumnFamilyException 
reading from socket; closing
cassandra_node_1| org.apache.cassandra.db.UnknownColumnFamilyException: 
Couldn't find table for cfId 5bc52802-de25-35ed-aeab-188eecebb090. If a table 
was just created, this is likely due to the schema not being fully propagated.  
Please wait for schema agreement on table creation.
cassandra_node_1|   at 
org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1578)
 ~[apache-cassandra-3.11.9-SNAPSHOT.jar:3.11.9-SNAPSHOT]
cassandra_node_1|   at 
org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:899)
 ~[apache-cassandra-3.11.9-SNAPSHOT.jar:3.11.9-SNAPSHOT]
cassandra_node_1|   at 
org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:874)
 ~[apache-cassandra-3.11.9-SNAPSHOT.jar:3.11.9-SNAPSHOT]
cassandra_node_1|   at 
org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:415)
 ~[apache-cassandra-3.11.9-SNAPSHOT.jar:3.11.9-SNAPSHOT]
cassandra_node_1|   at 
org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:434)
 ~[apache-cassandra-3.11.9-SNAPSHOT.jar:3.11.9-SNAPSHOT]
cassandra_node_1|   at 
org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:371)
 ~[apache-cassandra-3.11.9-SNAPSHOT.jar:3.11.9-SNAPSHOT]
cassandra_node_1|   at 
org.apache.cassandra.net.MessageIn.read(MessageIn.java:123) 
~[apache-cassandra-3.11.9-SNAPSHOT.jar:3.11.9-SNAPSHOT]
cassandra_node_1|   at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195)
 ~[apache-cassandra-3.11.9-SNAPSHOT.jar:3.11.9-SNAPSHOT]
cassandra_node_1|   at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:183)
 ~[apache-cassandra-3.11.9-SNAPSHOT.jar:3.11.9-SNAPSHOT]

{code}

That cfId stands for system_auth/roles. It seems like we are applying changes 
before schema agreement has occured so that table is not there yet to apply 
mutations against.



was (Author: stefan.miklosovic):
I am getting this exception on totally clean node, I am bootstrapping a cluster 
of 3 nodes:


{code:java}
cassandra_node_1| INFO  [ScheduledTasks:1] 2020-09-07 15:10:13,037 
TokenMetadata.java:517 - Updating topology for all endpoints that have changed
cassandra_node_1| INFO  [HANDSHAKE-spark-master-1/172.19.0.5] 2020-09-07 
15:10:13,311 OutboundTcpConnection.java:561 - Handshaking version with 
spark-master-1/172.19.0.5
cassandra_node_1| INFO  [GossipStage:1] 2020-09-07 15:10:13,870 
Gossiper.java:1141 - Node /172.19.0.5 is now part of the cluster
cassandra_node_1| INFO  [GossipStage:1] 2020-09-07 15:10:13,904 
TokenMetadata.java:497 - Updating topology for /172.19.0.5
cassandra_node_1| INFO  [GossipStage:1] 2020-09-07 15:10:13,907 
TokenMetadata.java:497 - Updating topology for /172.19.0.5
cassandra_node_1| INFO  [GossipStage:1] 2020-09-07 15:10:14,052 
Gossiper.java:1103 - InetAddress /172.19.0.5 is now UP
cassandra_node_1| WARN  [MessagingService-Incoming-/172.19.0.5] 2020-09-07 
15:10:14,119 IncomingTcpConnection.java:103 - UnknownColumnFamilyException 
reading from socket; closing
cassandra_node_1| org.apache.cassandra.db.UnknownColumnFamilyException: 
Couldn't find table for cfId 5bc52802-de25-35ed-aeab-188eecebb090. If a table 
was just created, this is likely due to the schema not being fully propagated.  
Please wait for schema agreement on table creation.
cassandra_node_1|   at 

[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather than in flight schema requests when bootstrapping

2020-09-02 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189642#comment-17189642
 ] 

Blake Eggleston edited comment on CASSANDRA-15158 at 9/2/20, 7:18 PM:
--

Possibly, you still need to check in the submission task in case the node has 
died in the meantime. There would still be an intersection of node flapping 
rate and unfortunate scheduling where the lockup could occur though.

The queue, while a little awkward, also makes us a bit more resilient against 
other unanticipated states and/or bugs.


was (Author: bdeggleston):
Possibly, you still need to check in the submission task in case the node has 
died in the meantime. There would still be an intersection of node flapping 
rate and unfortunate scheduling where the lockup could occur though

> Wait for schema agreement rather than in flight schema requests when 
> bootstrapping
> --
>
> Key: CASSANDRA-15158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Schema
>Reporter: Vincent White
>Assignee: Blake Eggleston
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently when a node is bootstrapping we use a set of latches 
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
> in-flight schema pull requests, and we don't proceed with 
> bootstrapping/stream until all the latches are released (or we timeout 
> waiting for each one). One issue with this is that if we have a large schema, 
> or the retrieval of the schema from the other nodes was unexpectedly slow 
> then we have no explicit check in place to ensure we have actually received a 
> schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the 
> node to wait on each latche longer, there are cases where this doesn't help 
> because the callbacks for the schema pull requests have expired off the 
> messaging service's callback map 
> (org.apache.cassandra.net.MessagingService#callbacks) after 
> request_timeout_in_ms (default 10 seconds) before the other nodes were able 
> to respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the 
> rest of the live nodes before proceeding with bootstrapping. It also adds a 
> check to prevent the new node from flooding existing nodes with simultaneous 
> schema pull requests as can happen in large clusters.
> Removing the latch system should also prevent new nodes in large clusters 
> getting stuck for extended amounts of time as they wait 
> `migration_task_wait_in_seconds` on each of the latches left orphaned by the 
> timed out callbacks.
>  
> ||3.11||
> |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
> |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather than in flight schema requests when bootstrapping

2020-09-01 Thread Aleksey Yeschenko (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188581#comment-17188581
 ] 

Aleksey Yeschenko edited comment on CASSANDRA-15158 at 9/1/20, 3:49 PM:


Pushed some minor tweaks 
[here|https://github.com/iamaleksey/cassandra/commits/15158-review]. Made some 
bits more idiomatic, and changed the way in-flight requests are being kept 
track of.

In general, this does the job and solves the problem in the description. It 
doesn't, however, fully deal with storms in large clusters caused by a sequence 
of updates in quick succession, but, it's not intended to, either.

EDIT: the amount of synchronisation here bothers me a tiny bit, as all of it 
will likely have to be eventually gotten rid of, when and if TPC happens, but I 
can live with it.


was (Author: iamaleksey):
Pushed some minor tweaks 
[here|https://github.com/iamaleksey/cassandra/commits/15158-review]. Made some 
bits more idiomatic, and changed the way in-flight requests are being kept 
track of.

In general, this does the job and solves the problem in the description. It 
doesn't, however, fully deal with storms in large clusters caused by a sequence 
of updates in quick succession, but, it's not intended to, either.

> Wait for schema agreement rather than in flight schema requests when 
> bootstrapping
> --
>
> Key: CASSANDRA-15158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Schema
>Reporter: Vincent White
>Assignee: Blake Eggleston
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently when a node is bootstrapping we use a set of latches 
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
> in-flight schema pull requests, and we don't proceed with 
> bootstrapping/stream until all the latches are released (or we timeout 
> waiting for each one). One issue with this is that if we have a large schema, 
> or the retrieval of the schema from the other nodes was unexpectedly slow 
> then we have no explicit check in place to ensure we have actually received a 
> schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the 
> node to wait on each latche longer, there are cases where this doesn't help 
> because the callbacks for the schema pull requests have expired off the 
> messaging service's callback map 
> (org.apache.cassandra.net.MessagingService#callbacks) after 
> request_timeout_in_ms (default 10 seconds) before the other nodes were able 
> to respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the 
> rest of the live nodes before proceeding with bootstrapping. It also adds a 
> check to prevent the new node from flooding existing nodes with simultaneous 
> schema pull requests as can happen in large clusters.
> Removing the latch system should also prevent new nodes in large clusters 
> getting stuck for extended amounts of time as they wait 
> `migration_task_wait_in_seconds` on each of the latches left orphaned by the 
> timed out callbacks.
>  
> ||3.11||
> |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
> |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather then in flight schema requests when bootstrapping

2020-06-11 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133717#comment-17133717
 ] 

Stefan Miklosovic edited comment on CASSANDRA-15158 at 6/11/20, 9:58 PM:
-

Hi Blake,

 

because of your very helpful explanation I was able to put together yet another 
version of the solution to this problem. You will find it here

[https://github.com/apache/cassandra/pull/628]

Thanks for the review in advance


was (Author: stefan.miklosovic):
Hi Blake,

 

because of your very helpful explanation I was able to put together yet another 
version of the solution to this problem. You will find it here

[https://github.com/apache/cassandra/compare/trunk...smiklosovic:CASSANDRA-15158-rework]

Thanks for the review in advance

> Wait for schema agreement rather then in flight schema requests when 
> bootstrapping
> --
>
> Key: CASSANDRA-15158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Schema
>Reporter: Vincent White
>Assignee: Ben Bromhead
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently when a node is bootstrapping we use a set of latches 
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
> in-flight schema pull requests, and we don't proceed with 
> bootstrapping/stream until all the latches are released (or we timeout 
> waiting for each one). One issue with this is that if we have a large schema, 
> or the retrieval of the schema from the other nodes was unexpectedly slow 
> then we have no explicit check in place to ensure we have actually received a 
> schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the 
> node to wait on each latche longer, there are cases where this doesn't help 
> because the callbacks for the schema pull requests have expired off the 
> messaging service's callback map 
> (org.apache.cassandra.net.MessagingService#callbacks) after 
> request_timeout_in_ms (default 10 seconds) before the other nodes were able 
> to respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the 
> rest of the live nodes before proceeding with bootstrapping. It also adds a 
> check to prevent the new node from flooding existing nodes with simultaneous 
> schema pull requests as can happen in large clusters.
> Removing the latch system should also prevent new nodes in large clusters 
> getting stuck for extended amounts of time as they wait 
> `migration_task_wait_in_seconds` on each of the latches left orphaned by the 
> timed out callbacks.
>  
> ||3.11||
> |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
> |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather then in flight schema requests when bootstrapping

2020-05-08 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102543#comment-17102543
 ] 

Stefan Miklosovic edited comment on CASSANDRA-15158 at 5/8/20, 12:49 PM:
-

It seems to me that one aspect of the PR was overlooked so I just iterate on 
that one. The mechanim how to not flood nodes with schema pull messages is 
incorporated in the loop over callbacks. If you notice it, there are sleeps of 
various lenghts based on a request being already sent or not. This sleep will 
actually "delay" the next schema pull from the other node because during this 
time of a sleep, some schema could come from the node we just sent a message to 
so on the next iteration when another node is compared on schema equality, it 
may happen that there is not any need to pull it anymore because they are on 
par. Hence we are not blindly sending messages to all nodes.
 If there are some discrepancies, there is the global timeout set after which 
whole bootstrapping process will be evaluated as errorneous and (in the current 
code) we throw a ConfigurationException. This behaviour might be relaxed but I 
consider it more appropriate to just throw it there.


was (Author: stefan.miklosovic):
It seems to me that one aspect of the PR was overlooked so I just iterate on 
that one. The mechanims how to not flood nodes with schema pull messages is 
incorporated in a loop over callbacks. If you notice it, there are sleeps of 
various lenghts based on a request being already sent or not. This sleep will 
actually "delay" the next schema pull from the other node because during this 
time of a sleep, a schema could come in so on the next iteration when another 
node is compared on schema equality, it may happen that there is not any need 
to pull it anymore because they are on par. Hence we are not blindly sending 
messages to all nodes.
If there are some discrepancies, there is the global timeout set after which 
whole bootstrapping process will be evaluated as errorneous and (in the current 
code) we throw a ConfigurationException. This behaviour might be relaxed but I 
consider it more appropriate to just throw it there.

> Wait for schema agreement rather then in flight schema requests when 
> bootstrapping
> --
>
> Key: CASSANDRA-15158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Schema
>Reporter: Vincent White
>Assignee: Ben Bromhead
>Priority: Normal
>
> Currently when a node is bootstrapping we use a set of latches 
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
> in-flight schema pull requests, and we don't proceed with 
> bootstrapping/stream until all the latches are released (or we timeout 
> waiting for each one). One issue with this is that if we have a large schema, 
> or the retrieval of the schema from the other nodes was unexpectedly slow 
> then we have no explicit check in place to ensure we have actually received a 
> schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the 
> node to wait on each latche longer, there are cases where this doesn't help 
> because the callbacks for the schema pull requests have expired off the 
> messaging service's callback map 
> (org.apache.cassandra.net.MessagingService#callbacks) after 
> request_timeout_in_ms (default 10 seconds) before the other nodes were able 
> to respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the 
> rest of the live nodes before proceeding with bootstrapping. It also adds a 
> check to prevent the new node from flooding existing nodes with simultaneous 
> schema pull requests as can happen in large clusters.
> Removing the latch system should also prevent new nodes in large clusters 
> getting stuck for extended amounts of time as they wait 
> `migration_task_wait_in_seconds` on each of the latches left orphaned by the 
> timed out callbacks.
>  
> ||3.11||
> |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
> |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather then in flight schema requests when bootstrapping

2020-05-08 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102374#comment-17102374
 ] 

Stefan Miklosovic edited comment on CASSANDRA-15158 at 5/8/20, 8:50 AM:


Hi [~bdeggleston],

commenting on design issues, I am not completely sure if these issues you are 
talking about are related to this patch or they are already existing? We could 
indeed focus on the points you raised but it seems to me that the current 
(comitted) code is worse without this patch than with as I guess these problems 
are already there?

Isn't the goal here to have all nodes on same versions? Isn't the very fact 
that there are multiple versions pretty strange to begin with so we should not 
even try to join a node if they mismatch hence there is nothing to deal with in 
the first place? 
{quote}It will only wait until it has _some_ schema to begin bootstrapping, not 
all
{quote}
This is the most likely not true unless I am not getting something. The node to 
be bootstrapped will never advance in doing so unless all nodes have same 
versions. 
{quote} For instance, if a single node is reporting a schema version that no 
one else has, but the node is unreachable, what do we do?
{quote}
We should fail whole bootstrapping and one should go and fix it.
{quote}For instance, if a single node is reporting a schema version that no one 
else has, but the node is unreachable, what do we do?
{quote}
How can a node report its schema while being unreachable?
{quote}Next, I like how this limits the number of messages sent to a given 
endpoint, but we should also limit the number of messages we send out for a 
given schema version. If we have a large cluster, and all nodes are reporting 
the same version, we don't need to ask every node for it's schema.
{quote}
 

Got you, this might be tracked.

 

When it comes to testing, I admit that adding isRunningForcibly method feels 
like a hack but I had very hard time to test this stuff out. It was basically 
the only reasonable way possible at the time I was coding it, if you know of 
more better version, please tell me otherwise I am not sure what might be 
better here and we could stick with this for a time being? The whole testing 
methodology was based on these callbacks and checking their inner state which 
results into having a methods which are accepting them so we can elaborate on 
their state. Without "injecting" them from outside, I would not be able to do 
that.


was (Author: stefan.miklosovic):
Hi [~bdeggleston],

commenting on design issues, I am not completely sure if these issues you are 
talking about are related to this patch or they are already existing? We could 
indeed focus on the points you raised but it seems to me that the current 
(comitted) code is worse without this patch than with as I guess these problems 
are already there?

Isn't the goal here to have all nodes on same versions? Isn't the very fact 
that there are multiple versions pretty strange to begin with so we should not 
even try to join a node if they mismatch hence there is nothing to deal with in 
the first place? 
{quote}It will only wait until it has _some_ schema to begin bootstrapping, not 
all
{quote}
This is the most likely not true unless I am not getting something. The node to 
be bootstrapped will never advance in doing so unless all nodes have same 
versions. 
{quote} For instance, if a single node is reporting a schema version that no 
one else has, but the node is unreachable, what do we do?
{quote}
We should fail whole bootstrapping and one should go and fix it.
{quote}For instance, if a single node is reporting a schema version that no one 
else has, but the node is unreachable, what do we do?
{quote}
How can a node report its schema while being unreachable?
{quote}Next, I like how this limits the number of messages sent to a given 
endpoint, but we should also limit the number of messages we send out for a 
given schema version. If we have a large cluster, and all nodes are reporting 
the same version, we don't need to ask every node for it's schema.
{quote}
-I am sorry, I am not following what you say here, in particular the very last 
sentence. I think the schema is ever pull (message is sent) _only_ in case that 
reported schema version from Gossipper is different, only after that we are 
ever sending a message.-

I am taking this back, you might be right here, I see what you mean, but this 
make whole solution even more complicated.

When it comes to testing, I admit that adding isRunningForcibly method feels 
like a hack but I had very hard time to test this stuff out. It was basically 
the only reasonable way possible at the time I was coding it, if you know of 
more better version, please tell me otherwise I am not sure what might be 
better here and we could stick with this for a time being? The whole testing 
methodology was based on these 

[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather then in flight schema requests when bootstrapping

2020-05-08 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102374#comment-17102374
 ] 

Stefan Miklosovic edited comment on CASSANDRA-15158 at 5/8/20, 8:42 AM:


Hi [~bdeggleston],

commenting on design issues, I am not completely sure if these issues you are 
talking about are related to this patch or they are already existing? We could 
indeed focus on the points you raised but it seems to me that the current 
(comitted) code is worse without this patch than with as I guess these problems 
are already there?

Isn't the goal here to have all nodes on same versions? Isn't the very fact 
that there are multiple versions pretty strange to begin with so we should not 
even try to join a node if they mismatch hence there is nothing to deal with in 
the first place? 
{quote}It will only wait until it has _some_ schema to begin bootstrapping, not 
all
{quote}
This is the most likely not true unless I am not getting something. The node to 
be bootstrapped will never advance in doing so unless all nodes have same 
versions. 
{quote} For instance, if a single node is reporting a schema version that no 
one else has, but the node is unreachable, what do we do?
{quote}
We should fail whole bootstrapping and one should go and fix it.
{quote}For instance, if a single node is reporting a schema version that no one 
else has, but the node is unreachable, what do we do?
{quote}
How can a node report its schema while being unreachable?
{quote}Next, I like how this limits the number of messages sent to a given 
endpoint, but we should also limit the number of messages we send out for a 
given schema version. If we have a large cluster, and all nodes are reporting 
the same version, we don't need to ask every node for it's schema.
{quote}
-I am sorry, I am not following what you say here, in particular the very last 
sentence. I think the schema is ever pull (message is sent) _only_ in case that 
reported schema version from Gossipper is different, only after that we are 
ever sending a message.-

I am taking this back, you might be right here, I see what you mean, but this 
make whole solution even more complicated.

When it comes to testing, I admit that adding isRunningForcibly method feels 
like a hack but I had very hard time to test this stuff out. It was basically 
the only reasonable way possible at the time I was coding it, if you know of 
more better version, please tell me otherwise I am not sure what might be 
better here and we could stick with this for a time being? The whole testing 
methodology was based on these callbacks and checking their inner state which 
results into having a methods which are accepting them so we can elaborate on 
their state. Without "injecting" them from outside, I would not be able to do 
that.


was (Author: stefan.miklosovic):
Hi [~bdeggleston],

commenting on design issues, I am not completely sure if these issues you are 
talking about are related to this patch or they are already existing? We could 
indeed focus on the points you raised but it seems to me that the current 
(comitted) code is worse without this patch than with as I guess these problems 
are already there?

Isn't the goal here to have all nodes on same versions? Isn't the very fact 
that there are multiple versions pretty strange to begin with so we should not 
even try to join a node if they mismatch hence there is nothing to deal with in 
the first place? 
{quote}It will only wait until it has _some_ schema to begin bootstrapping, not 
all
{quote}
This is the most likely not true unless I am not getting something. The node to 
be bootstrapped will never advance in doing so unless all nodes have same 
versions. 
{quote} For instance, if a single node is reporting a schema version that no 
one else has, but the node is unreachable, what do we do?
{quote}

We should fail whole bootstrapping and one should go and fix it.

{quote}For instance, if a single node is reporting a schema version that no one 
else has, but the node is unreachable, what do we do?{quote}

How can a node report its schema while being unreachable?

{quote}Next, I like how this limits the number of messages sent to a given 
endpoint, but we should also limit the number of messages we send out for a 
given schema version. If we have a large cluster, and all nodes are reporting 
the same version, we don't need to ask every node for it's schema.{quote}

I am sorry, I am not following what you say here, in particular the very last 
sentence. I think the schema is ever pull (message is sent) _only_  in case 
that reported schema version from Gossipper is different, only after that we 
are ever sending a message.

When it comes to testing, I admit that adding isRunningForcibly method feels 
like a hack but I had very hard time to test this stuff out. It was basically 
the only reasonable way possible at 

[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather then in flight schema requests when bootstrapping

2020-05-08 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102374#comment-17102374
 ] 

Stefan Miklosovic edited comment on CASSANDRA-15158 at 5/8/20, 8:34 AM:


Hi [~bdeggleston],

commenting on design issues, I am not completely sure if these issues you are 
talking about are related to this patch or they are already existing? We could 
indeed focus on the points you raised but it seems to me that the current 
(comitted) code is worse without this patch than with as I guess these problems 
are already there?

Isn't the goal here to have all nodes on same versions? Isn't the very fact 
that there are multiple versions pretty strange to begin with so we should not 
even try to join a node if they mismatch hence there is nothing to deal with in 
the first place? 
{quote}It will only wait until it has _some_ schema to begin bootstrapping, not 
all
{quote}
This is the most likely not true unless I am not getting something. The node to 
be bootstrapped will never advance in doing so unless all nodes have same 
versions. 
{quote} For instance, if a single node is reporting a schema version that no 
one else has, but the node is unreachable, what do we do?
{quote}

We should fail whole bootstrapping and one should go and fix it.

{quote}For instance, if a single node is reporting a schema version that no one 
else has, but the node is unreachable, what do we do?{quote}

How can a node report its schema while being unreachable?

{quote}Next, I like how this limits the number of messages sent to a given 
endpoint, but we should also limit the number of messages we send out for a 
given schema version. If we have a large cluster, and all nodes are reporting 
the same version, we don't need to ask every node for it's schema.{quote}

I am sorry, I am not following what you say here, in particular the very last 
sentence. I think the schema is ever pull (message is sent) _only_  in case 
that reported schema version from Gossipper is different, only after that we 
are ever sending a message.

When it comes to testing, I admit that adding isRunningForcibly method feels 
like a hack but I had very hard time to test this stuff out. It was basically 
the only reasonable way possible at the time I was coding it, if you know of 
more better version, please tell me otherwise I am not sure what might be 
better here and we could stick with this for a time being? The whole testing 
methodology was based on these callbacks and checking their inner state which 
results into having a methods which are accepting them so we can elaborate on 
their state. Without "injecting" them from outside, I would not be able to do 
that. 


was (Author: stefan.miklosovic):
Hi [~bdeggleston],

commenting on design issues, I am not completely sure if these issues you are 
talking about are related to this patch or they are already existing? We could 
indeed focus on the points you raised but it seems to me that the current 
(comitted) code is worse without this patch than with as I guess these problems 
are already there?

Isn't the goal here to have all nodes on same versions? Isn't the very fact 
that there are multiple versions pretty strange to begin with so we should not 
even try to join a node if they mismatch hence there is nothing to deal with in 
the first place? 
{quote}It will only wait until it has _some_ schema to begin bootstrapping, not 
all
{quote}
This is the most likely not true unless I am not getting something. The node to 
be bootstrapped will never advance in doing so unless all nodes have same 
versions. 
{quote} For instance, if a single node is reporting a schema version that no 
one else has, but the node is unreachable, what do we do?
{quote}

We should fail whole bootstrapping and one should go and fix it.

{quote}Next, I like how this limits the number of messages sent to a given 
endpoint, but we should also limit the number of messages we send out for a 
given schema version. If we have a large cluster, and all nodes are reporting 
the same version, we don't need to ask every node for it's schema.{quote}

I am sorry, I am not following what you say here, in particular the very last 
sentence. I think the schema is ever pull (message is sent) _only_  in case 
that reported schema version from Gossipper is different, only after that we 
are ever sending a message.

When it comes to testing, I admit that adding isRunningForcibly method feels 
like a hack but I had very hard time to test this stuff out. It was basically 
the only reasonable way possible at the time I was coding it, if you know of 
more better version, please tell me otherwise I am not sure what might be 
better here and we could stick with this for a time being? The whole testing 
methodology was based on these callbacks and checking their inner state which 
results into having a methods which are 

[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather then in flight schema requests when bootstrapping

2020-04-28 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093752#comment-17093752
 ] 

Stefan Miklosovic edited comment on CASSANDRA-15158 at 4/28/20, 7:35 PM:
-

Hi [~bdeggleston]

I took the patch and rewored it little bit

[https://github.com/smiklosovic/cassandra/tree/CASSANDRA-15158-2]

Looking forward to have some feedback!


was (Author: stefan.miklosovic):
Hi [~bdeggleston]

I took the patch and rewored it little bit

[https://github.com/smiklosovic/cassandra/commits/CASSANDRA-15158]

You have said that there is not any way how to confirm that any in flight 
migration tasks have been completed and applied. I do not know how to verify 
this was indeed done but what I did is that I have exposed the information if a 
particular migration task has failed or not to process (based on onFailure 
callback) so we can work on this further if necessary.

Logically, it is same stuff as it was before in the original patch but the code 
is reorganised a bit. The "escape hatch" is one global bootstrap timeout, if it 
is passed and schemas are still not in agreement, it is still uknown to me what 
we want to do - either fail completely and halt that node or we allow to 
proceed with big fat warning. 

Looking forward to have some feedback!

> Wait for schema agreement rather then in flight schema requests when 
> bootstrapping
> --
>
> Key: CASSANDRA-15158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Schema
>Reporter: Vincent White
>Assignee: Ben Bromhead
>Priority: Normal
>
> Currently when a node is bootstrapping we use a set of latches 
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
> in-flight schema pull requests, and we don't proceed with 
> bootstrapping/stream until all the latches are released (or we timeout 
> waiting for each one). One issue with this is that if we have a large schema, 
> or the retrieval of the schema from the other nodes was unexpectedly slow 
> then we have no explicit check in place to ensure we have actually received a 
> schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the 
> node to wait on each latche longer, there are cases where this doesn't help 
> because the callbacks for the schema pull requests have expired off the 
> messaging service's callback map 
> (org.apache.cassandra.net.MessagingService#callbacks) after 
> request_timeout_in_ms (default 10 seconds) before the other nodes were able 
> to respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the 
> rest of the live nodes before proceeding with bootstrapping. It also adds a 
> check to prevent the new node from flooding existing nodes with simultaneous 
> schema pull requests as can happen in large clusters.
> Removing the latch system should also prevent new nodes in large clusters 
> getting stuck for extended amounts of time as they wait 
> `migration_task_wait_in_seconds` on each of the latches left orphaned by the 
> timed out callbacks.
>  
> ||3.11||
> |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
> |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather then in flight schema requests when bootstrapping

2020-04-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093752#comment-17093752
 ] 

Stefan Miklosovic edited comment on CASSANDRA-15158 at 4/27/20, 7:19 PM:
-

Hi [~bdeggleston]

I took the patch and rewored it little bit

[https://github.com/smiklosovic/cassandra/commits/CASSANDRA-15158]

You have said that there is not any way how to confirm that any in flight 
migration tasks have been completed and applied. I do not know how to verify 
this was indeed done but what I did is that I have exposed the information if a 
particular migration task has failed or not to process (based on onFailure 
callback) so we can work on this further if necessary.

Logically, it is same stuff as it was before in the original patch but the code 
is reorganised a bit. The "escape hatch" is one global bootstrap timeout, if it 
is passed and schemas are still not in agreement, it is still uknown to me what 
we want to do - either fail completely and halt that node or we allow to 
proceed with big fat warning. 

Looking forward to have some feedback!


was (Author: stefan.miklosovic):
Hi [~bdeggleston]

I took the patch and rewored it little bit

[https://github.com/smiklosovic/cassandra/commits/CASSANDRA-15158]

You have said that there is not any way how to confirm that any in flight 
migration tasks have been completed and applied. I do not know how to verify 
this was indeed done but what I did is that I have exposed the information if a 
particular migration task has failed or nor to process (based on onFailure 
callback) so we can work on this further if necessary.

Logically, it is same stuff as it was before in the original patch but the code 
is reorganised a bit. The "escape hatch" is one global bootstrap timeout, if it 
is passed and schemas are still not in agreement, it is still uknown to me what 
we want to do - either fail completely and halt that service or we allow to 
proceed with big fat warning. 

Looking forward to have some feedback!

> Wait for schema agreement rather then in flight schema requests when 
> bootstrapping
> --
>
> Key: CASSANDRA-15158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Schema
>Reporter: Vincent White
>Assignee: Ben Bromhead
>Priority: Normal
>
> Currently when a node is bootstrapping we use a set of latches 
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
> in-flight schema pull requests, and we don't proceed with 
> bootstrapping/stream until all the latches are released (or we timeout 
> waiting for each one). One issue with this is that if we have a large schema, 
> or the retrieval of the schema from the other nodes was unexpectedly slow 
> then we have no explicit check in place to ensure we have actually received a 
> schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the 
> node to wait on each latche longer, there are cases where this doesn't help 
> because the callbacks for the schema pull requests have expired off the 
> messaging service's callback map 
> (org.apache.cassandra.net.MessagingService#callbacks) after 
> request_timeout_in_ms (default 10 seconds) before the other nodes were able 
> to respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the 
> rest of the live nodes before proceeding with bootstrapping. It also adds a 
> check to prevent the new node from flooding existing nodes with simultaneous 
> schema pull requests as can happen in large clusters.
> Removing the latch system should also prevent new nodes in large clusters 
> getting stuck for extended amounts of time as they wait 
> `migration_task_wait_in_seconds` on each of the latches left orphaned by the 
> timed out callbacks.
>  
> ||3.11||
> |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
> |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org