[jira] [Commented] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091412#comment-17091412 ] ZhaoYang commented on CASSANDRA-15666: -- thanks for the review and feedback~ > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Time Spent: 20m > Remaining Estimate: 0h > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: > a) Node A will call it twice from both the IO thread and the thread at #1, > closing the session and its channels. > b) Node B will attempt to send a {{CompleteMessage}}, but will fail because > the session has been closed in the meantime. > There are other subtle variations of the pattern above, depending on the > order of concurrently sent/received messages. > I believe the best fix would be to modify the message exchange so that: > 1) Only the "follower" is allowed to send the {{CompleteMessage}}. > 2) Only the "initiator" is allowed to close the session and its channels > after receiving the {{CompleteMessage}}. > By doing so, the message exchange logic would be easier to reason about, > which is overall a win anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090579#comment-17090579 ] Benjamin Lerer commented on CASSANDRA-15666: It looks good to me. Thanks for the patch and the reviews [~jasonstack] [~sbtourist] > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: > a) Node A will call it twice from both the IO thread and the thread at #1, > closing the session and its channels. > b) Node B will attempt to send a {{CompleteMessage}}, but will fail because > the session has been closed in the meantime. > There are other subtle variations of the pattern above, depending on the > order of concurrently sent/received messages. > I believe the best fix would be to modify the message exchange so that: > 1) Only the "follower" is allowed to send the {{CompleteMessage}}. > 2) Only the "initiator" is allowed to close the session and its channels > after receiving the {{CompleteMessage}}. > By doing so, the message exchange logic would be easier to reason about, > which is overall a win anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087966#comment-17087966 ] Sergio Bossa commented on CASSANDRA-15666: -- Good to merge for me. > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: > a) Node A will call it twice from both the IO thread and the thread at #1, > closing the session and its channels. > b) Node B will attempt to send a {{CompleteMessage}}, but will fail because > the session has been closed in the meantime. > There are other subtle variations of the pattern above, depending on the > order of concurrently sent/received messages. > I believe the best fix would be to modify the message exchange so that: > 1) Only the "follower" is allowed to send the {{CompleteMessage}}. > 2) Only the "initiator" is allowed to close the session and its channels > after receiving the {{CompleteMessage}}. > By doing so, the message exchange logic would be easier to reason about, > which is overall a win anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081858#comment-17081858 ] ZhaoYang commented on CASSANDRA-15666: -- | [patch|https://github.com/apache/cassandra/pull/497] | [dtest|https://github.com/apache/cassandra-dtest/pull/63] | Previous changes: * Synchronization on "StreamSession#maybeComplete()" to avoid race condition on streaming completion. * Only the "follower" is allowed to send the CompleteMessage. * Only the "initiator" is allowed to close the session and its channels after receiving the CompleteMessage. New changes to fix dtest failures: * NettyStreamingMessageSender: ** don't close channels in NettyStreamingMessageSender, as they will be closed by initator on "closeSession()" ** handle fileTransferExecutor pool shutdown gracefully in case of ClosedByInterruptException ** only include inbound handler for initiator's control channel * ChannelProxy: ** Use new "ChannelProxy" instance instead of shared copy in stream writer to prevent interruped thread closing backing channel * StreamSession: close session if channel is closed due to EOF from "StreamingInboundHandler" * OutboundConnection: fix OutboundConnection.id() to use proper remote/local address * Preview repair: ** Move follower's "completePreview()" from "prepareAck()" to "prepareAsync()" because initiator will close connection via "completePreview" on "prepareSynAck()" ** In case of preview, do not send "PrepareAckMessage" to follower, as follower has already closed connection on "prepareAck()" * Dtest: include "Socket closed before session completion" into {{ignore_log_patterns}}, it's logged when stream session was closed upon EOF due to node down. > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: > a) Node A will call it twice from both the IO thread and the thread at #1, > closing the session and its channels. > b) Node B will attempt to send a {{CompleteMessage}}, but will fail because > the session has been closed in the meantime. > There are other subtle variations of the pattern above, depending on the > order of concurrently sent/received messages. > I believe the best fix would be to modify the message exchange so that: > 1) Only the "follower" is allowed to send the {{CompleteMessage}}. > 2) Only the "initiator" is allowed to close the session and its channels > after receiving the {{CompleteMessage}}. > By doing so, the message exchange logic would be easier to reason about, > which is overall a win anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079527#comment-17079527 ] Sergio Bossa commented on CASSANDRA-15666: -- [~jasonstack] thanks for the followups, sent another round of comments. > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0 > > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: > a) Node A will call it twice from both the IO thread and the thread at #1, > closing the session and its channels. > b) Node B will attempt to send a {{CompleteMessage}}, but will fail because > the session has been closed in the meantime. > There are other subtle variations of the pattern above, depending on the > order of concurrently sent/received messages. > I believe the best fix would be to modify the message exchange so that: > 1) Only the "follower" is allowed to send the {{CompleteMessage}}. > 2) Only the "initiator" is allowed to close the session and its channels > after receiving the {{CompleteMessage}}. > By doing so, the message exchange logic would be easier to reason about, > which is overall a win anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078472#comment-17078472 ] ZhaoYang commented on CASSANDRA-15666: -- bq. 1) Only the "follower" is allowed to send the CompleteMessage. bq. 2) Only the "initiator" is allowed to close the session and its channels after receiving the CompleteMessage. [~sbtourist] [~blerer] I have addressed review feedback and include above modification. do you mind having a look? > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0 > > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: > a) Node A will call it twice from both the IO thread and the thread at #1, > closing the session and its channels. > b) Node B will attempt to send a {{CompleteMessage}}, but will fail because > the session has been closed in the meantime. > There are other subtle variations of the pattern above, depending on the > order of concurrently sent/received messages. > I believe the best fix would be to modify the message exchange so that: > 1) Only the "follower" is allowed to send the {{CompleteMessage}}. > 2) Only the "initiator" is allowed to close the session and its channels > after receiving the {{CompleteMessage}}. > By doing so, the message exchange logic would be easier to reason about, > which is overall a win anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078313#comment-17078313 ] Benjamin Lerer commented on CASSANDRA-15666: I put some comments on the PR. It is always easier to fix some problems in major versions as there are less constraints during upgrades. So unless we believe that it will take a long time, it is probably better to fix it in the scope of that ticket. > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0 > > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: > a) Node A will call it twice from both the IO thread and the thread at #1, > closing the session and its channels. > b) Node B will attempt to send a {{CompleteMessage}}, but will fail because > the session has been closed in the meantime. > There are other subtle variations of the pattern above, depending on the > order of concurrently sent/received messages. > I believe the best fix would be to modify the message exchange so that: > 1) Only the "follower" is allowed to send the {{CompleteMessage}}. > 2) Only the "initiator" is allowed to close the session and its channels > after receiving the {{CompleteMessage}}. > By doing so, the message exchange logic would be easier to reason about, > which is overall a win anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077478#comment-17077478 ] Sergio Bossa commented on CASSANDRA-15666: -- {quote}Let's see what [~blerer] has to say {quote} Let's not delay this fix further, unless [~blerer] really wants to chime in? > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0 > > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: > a) Node A will call it twice from both the IO thread and the thread at #1, > closing the session and its channels. > b) Node B will attempt to send a {{CompleteMessage}}, but will fail because > the session has been closed in the meantime. > There are other subtle variations of the pattern above, depending on the > order of concurrently sent/received messages. > I believe the best fix would be to modify the message exchange so that: > 1) Only the "follower" is allowed to send the {{CompleteMessage}}. > 2) Only the "initiator" is allowed to close the session and its channels > after receiving the {{CompleteMessage}}. > By doing so, the message exchange logic would be easier to reason about, > which is overall a win anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074405#comment-17074405 ] ZhaoYang commented on CASSANDRA-15666: -- bq. Regarding the changes to the CompleteMessage exchange, I still think that'd be a win regardless if the race is fixed in a different way Let's see what [~blerer] has to say.. > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0 > > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: > a) Node A will call it twice from both the IO thread and the thread at #1, > closing the session and its channels. > b) Node B will attempt to send a {{CompleteMessage}}, but will fail because > the session has been closed in the meantime. > There are other subtle variations of the pattern above, depending on the > order of concurrently sent/received messages. > I believe the best fix would be to modify the message exchange so that: > 1) Only the "follower" is allowed to send the {{CompleteMessage}}. > 2) Only the "initiator" is allowed to close the session and its channels > after receiving the {{CompleteMessage}}. > By doing so, the message exchange logic would be easier to reason about, > which is overall a win anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073831#comment-17073831 ] Sergio Bossa commented on CASSANDRA-15666: -- [~jasonstack] your fix looks good, just left a few minor comments on the PR. Regarding the changes to the {{CompleteMessage}} exchange, I still think that'd be a win regardless if the race is fixed in a different way, as the current implementation makes it harder to reason about its correctness (which means it could be prone to other races), but I also understand we want to limit the scope of changes to 4.0, so not pushing hard for it. > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0 > > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: > a) Node A will call it twice from both the IO thread and the thread at #1, > closing the session and its channels. > b) Node B will attempt to send a {{CompleteMessage}}, but will fail because > the session has been closed in the meantime. > There are other subtle variations of the pattern above, depending on the > order of concurrently sent/received messages. > I believe the best fix would be to modify the message exchange so that: > 1) Only the "follower" is allowed to send the {{CompleteMessage}}. > 2) Only the "initiator" is allowed to close the session and its channels > after receiving the {{CompleteMessage}}. > By doing so, the message exchange logic would be easier to reason about, > which is overall a win anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073052#comment-17073052 ] Michael Semb Wever commented on CASSANDRA-15666: ||branch||circleci||jenkins|| |[trunk_15666|https://github.com/apache/cassandra/compare/trunk...jasonstack:CASSANDRA-15666-trunk]|[circleci|https://circleci.com/gh/jasonstack/workflows/cassandra/tree/CASSANDRA-15666-trunk]|[!https://ci-cassandra.apache.org/job/Cassandra-devbranch/9/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/9]| > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0 > > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: > a) Node A will call it twice from both the IO thread and the thread at #1, > closing the session and its channels. > b) Node B will attempt to send a {{CompleteMessage}}, but will fail because > the session has been closed in the meantime. > There are other subtle variations of the pattern above, depending on the > order of concurrently sent/received messages. > I believe the best fix would be to modify the message exchange so that: > 1) Only the "follower" is allowed to send the {{CompleteMessage}}. > 2) Only the "initiator" is allowed to close the session and its channels > after receiving the {{CompleteMessage}}. > By doing so, the message exchange logic would be easier to reason about, > which is overall a win anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068663#comment-17068663 ] Sergio Bossa commented on CASSANDRA-15666: -- bq. it's still possible to send 2 CompleteMessage by follower when maybeComplete() in prepareAsync() is delayed and race with maybeComplete() in taskCompleted(). Correct. I agree the overall thread safety of {{StreamSession}} should be fixed. > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: > a) Node A will call it twice from both the IO thread and the thread at #1, > closing the session and its channels. > b) Node B will attempt to send a {{CompleteMessage}}, but will fail because > the session has been closed in the meantime. > There are other subtle variations of the pattern above, depending on the > order of concurrently sent/received messages. > I believe the best fix would be to modify the message exchange so that: > 1) Only the "follower" is allowed to send the {{CompleteMessage}}. > 2) Only the "initiator" is allowed to close the session and its channels > after receiving the {{CompleteMessage}}. > By doing so, the message exchange logic would be easier to reason about, > which is overall a win anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15666) Race condition when completing stream sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068591#comment-17068591 ] ZhaoYang commented on CASSANDRA-15666: -- {quote}4) At this point, if this was the only file to stream, both nodes are ready to close the session via maybeCompleted(), but: a) Node A will call it twice from both the IO thread and the thread at #1, closing the session and its channels. b) Node B will attempt to send a CompleteMessage, but will fail because the session has been closed in the meantime. {quote} This can be reproduced by delaying {{maybeComplete}} in {{prepareAsync}} until requests/transfers are empty at follower side. {quote}I believe the best fix would be to modify the message exchange so that: 1) Only the "follower" is allowed to send the CompleteMessage. 2) Only the "initiator" is allowed to close the session and its channels after receiving the CompleteMessage. {quote} Above points will definitely make streaming state easier to reason. But they may not be sufficient, it's still possible to send 2 CompleteMessage by follower when {{maybeComplete()}} in {{prepareAsync()}} is delayed and race with {{maybeComplete()}} in {{taskCompleted()}}. 1) Follower sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread and {{maybeComplete()}} is delayed. 2) Initiator receives it and starts streaming. 3) Follower receives the streamed files and sends {{ReceivedMessage}}. 4) Follower receives all streamed files and triggers {{maybeComplete()}} in {{taskCompleted}} 5) Follower will send 2 {{CompleteMessage}} because of step 1) and step 4) I think we also need to enhance synchronization on state transition and sending CompleteMessage. WDYT? > Race condition when completing stream sessions > -- > > Key: CASSANDRA-15666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15666 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > > {{StreamSession#prepareAsync()}} executes, as the name implies, > asynchronously from the IO thread: this opens up for race conditions between > the sending of the {{PrepareSynAckMessage}} and the call to > {{StreamSession#maybeCompleted()}}. I.e., the following could happen: > 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread. > 2) Node B receives it and starts streaming. > 3) Node A receives the streamed file and sends {{ReceivedMessage}}. > 4) At this point, if this was the only file to stream, both nodes are ready > to close the session via {{maybeCompleted()}}, but: > a) Node A will call it twice from both the IO thread and the thread at #1, > closing the session and its channels. > b) Node B will attempt to send a {{CompleteMessage}}, but will fail because > the session has been closed in the meantime. > There are other subtle variations of the pattern above, depending on the > order of concurrently sent/received messages. > I believe the best fix would be to modify the message exchange so that: > 1) Only the "follower" is allowed to send the {{CompleteMessage}}. > 2) Only the "initiator" is allowed to close the session and its channels > after receiving the {{CompleteMessage}}. > By doing so, the message exchange logic would be easier to reason about, > which is overall a win anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org