[jira] [Commented] (CASSANDRA-15667) StreamResultFuture check for completeness is inconsistent, leading to races

2020-04-24 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091505#comment-17091505
 ] 

Benjamin Lerer commented on CASSANDRA-15667:


Jenkin CI | 
[[4.0|https://ci-cassandra.apache.org/job/Cassandra-devbranch/74/]|[3.0|https://ci-cassandra.apache.org/job/Cassandra-devbranch/75/]|[3.11|https://ci-cassandra.apache.org/job/Cassandra-devbranch/76/]|

> StreamResultFuture check for completeness is inconsistent, leading to races
> ---
>
> Key: CASSANDRA-15667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15667
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Massimiliano Tomassi
>Priority: Normal
> Fix For: 4.0
>
> Attachments: log_bootstrap_resumable
>
>
> {{StreamResultFuture#maybeComplete()}} uses 
> {{StreamCoordinator#hasActiveSessions()}} to determine if all sessions are 
> completed, but then accesses each session state via 
> {{StreamCoordinator#getAllSessionInfo()}}: this is inconsistent, as the 
> former relies on the actual {{StreamSession}} state, while the latter on the 
> {{SessionInfo}} state, and the two are concurrently updated with no 
> coordination whatsoever.
> This leads to races, i.e. apparent in some dtest spurious failures, such as 
> {{TestBootstrap.resumable_bootstrap_test}} in CASSANDRA-15614 cc 
> [~e.dimitrova].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15667) StreamResultFuture check for completeness is inconsistent, leading to races

2020-04-14 Thread Sergio Bossa (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083146#comment-17083146
 ] 

Sergio Bossa commented on CASSANDRA-15667:
--

[~maxtomassi], [~e.dimitrova], thanks for your PRs. I'm +1 to both.

> StreamResultFuture check for completeness is inconsistent, leading to races
> ---
>
> Key: CASSANDRA-15667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15667
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Massimiliano Tomassi
>Priority: Normal
> Fix For: 4.0
>
> Attachments: log_bootstrap_resumable
>
>
> {{StreamResultFuture#maybeComplete()}} uses 
> {{StreamCoordinator#hasActiveSessions()}} to determine if all sessions are 
> completed, but then accesses each session state via 
> {{StreamCoordinator#getAllSessionInfo()}}: this is inconsistent, as the 
> former relies on the actual {{StreamSession}} state, while the latter on the 
> {{SessionInfo}} state, and the two are concurrently updated with no 
> coordination whatsoever.
> This leads to races, i.e. apparent in some dtest spurious failures, such as 
> {{TestBootstrap.resumable_bootstrap_test}} in CASSANDRA-15614 cc 
> [~e.dimitrova].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15667) StreamResultFuture check for completeness is inconsistent, leading to races

2020-04-14 Thread Massimiliano Tomassi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083119#comment-17083119
 ] 

Massimiliano Tomassi commented on CASSANDRA-15667:
--

[~jasonstack], sorry for the delay. I've updated my first comment with links to 
PRs against 3.0 and 3.11 too.

> StreamResultFuture check for completeness is inconsistent, leading to races
> ---
>
> Key: CASSANDRA-15667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15667
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Massimiliano Tomassi
>Priority: Normal
> Fix For: 4.0
>
> Attachments: log_bootstrap_resumable
>
>
> {{StreamResultFuture#maybeComplete()}} uses 
> {{StreamCoordinator#hasActiveSessions()}} to determine if all sessions are 
> completed, but then accesses each session state via 
> {{StreamCoordinator#getAllSessionInfo()}}: this is inconsistent, as the 
> former relies on the actual {{StreamSession}} state, while the latter on the 
> {{SessionInfo}} state, and the two are concurrently updated with no 
> coordination whatsoever.
> This leads to races, i.e. apparent in some dtest spurious failures, such as 
> {{TestBootstrap.resumable_bootstrap_test}} in CASSANDRA-15614 cc 
> [~e.dimitrova].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15667) StreamResultFuture check for completeness is inconsistent, leading to races

2020-04-10 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081026#comment-17081026
 ] 

Ekaterina Dimitrova commented on CASSANDRA-15667:
-

Attached a log of 100 successful runs.
(Turned out the issue was false alarm, my local config got messed up, nothing 
related to the test or the patch)

> StreamResultFuture check for completeness is inconsistent, leading to races
> ---
>
> Key: CASSANDRA-15667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15667
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Massimiliano Tomassi
>Priority: Normal
> Fix For: 4.0
>
>
> {{StreamResultFuture#maybeComplete()}} uses 
> {{StreamCoordinator#hasActiveSessions()}} to determine if all sessions are 
> completed, but then accesses each session state via 
> {{StreamCoordinator#getAllSessionInfo()}}: this is inconsistent, as the 
> former relies on the actual {{StreamSession}} state, while the latter on the 
> {{SessionInfo}} state, and the two are concurrently updated with no 
> coordination whatsoever.
> This leads to races, i.e. apparent in some dtest spurious failures, such as 
> {{TestBootstrap.resumable_bootstrap_test}} in CASSANDRA-15614 cc 
> [~e.dimitrova].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15667) StreamResultFuture check for completeness is inconsistent, leading to races

2020-04-10 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080472#comment-17080472
 ] 

Ekaterina Dimitrova commented on CASSANDRA-15667:
-

I am looking now at the test as I triggered it to run again 100 times yesterday 
and it seems there is some error. I am gonna figure it out and update the 
ticket later


> StreamResultFuture check for completeness is inconsistent, leading to races
> ---
>
> Key: CASSANDRA-15667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15667
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Massimiliano Tomassi
>Priority: Normal
> Fix For: 4.0
>
>
> {{StreamResultFuture#maybeComplete()}} uses 
> {{StreamCoordinator#hasActiveSessions()}} to determine if all sessions are 
> completed, but then accesses each session state via 
> {{StreamCoordinator#getAllSessionInfo()}}: this is inconsistent, as the 
> former relies on the actual {{StreamSession}} state, while the latter on the 
> {{SessionInfo}} state, and the two are concurrently updated with no 
> coordination whatsoever.
> This leads to races, i.e. apparent in some dtest spurious failures, such as 
> {{TestBootstrap.resumable_bootstrap_test}} in CASSANDRA-15614 cc 
> [~e.dimitrova].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15667) StreamResultFuture check for completeness is inconsistent, leading to races

2020-04-10 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080281#comment-17080281
 ] 

ZhaoYang commented on CASSANDRA-15667:
--

[~maxtomassi] thanks for the patch.. should we also fix 3.0 and 3.11 ?

> StreamResultFuture check for completeness is inconsistent, leading to races
> ---
>
> Key: CASSANDRA-15667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15667
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Massimiliano Tomassi
>Priority: Normal
> Fix For: 4.0
>
>
> {{StreamResultFuture#maybeComplete()}} uses 
> {{StreamCoordinator#hasActiveSessions()}} to determine if all sessions are 
> completed, but then accesses each session state via 
> {{StreamCoordinator#getAllSessionInfo()}}: this is inconsistent, as the 
> former relies on the actual {{StreamSession}} state, while the latter on the 
> {{SessionInfo}} state, and the two are concurrently updated with no 
> coordination whatsoever.
> This leads to races, i.e. apparent in some dtest spurious failures, such as 
> {{TestBootstrap.resumable_bootstrap_test}} in CASSANDRA-15614 cc 
> [~e.dimitrova].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15667) StreamResultFuture check for completeness is inconsistent, leading to races

2020-04-09 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079380#comment-17079380
 ] 

Ekaterina Dimitrova commented on CASSANDRA-15667:
-

Hi Sergio,
Thanks for the patch and the broad explanation. 

I just returned the test 
[here|https://github.com/ekaterinadimitrova2/cassandra-dtest/tree/CASSANDRA-15667]

Unfortunately, I am not a committer so anyone else should do it. Thanks again!

> StreamResultFuture check for completeness is inconsistent, leading to races
> ---
>
> Key: CASSANDRA-15667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15667
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Massimiliano Tomassi
>Priority: Normal
> Fix For: 4.0
>
>
> {{StreamResultFuture#maybeComplete()}} uses 
> {{StreamCoordinator#hasActiveSessions()}} to determine if all sessions are 
> completed, but then accesses each session state via 
> {{StreamCoordinator#getAllSessionInfo()}}: this is inconsistent, as the 
> former relies on the actual {{StreamSession}} state, while the latter on the 
> {{SessionInfo}} state, and the two are concurrently updated with no 
> coordination whatsoever.
> This leads to races, i.e. apparent in some dtest spurious failures, such as 
> {{TestBootstrap.resumable_bootstrap_test}} in CASSANDRA-15614 cc 
> [~e.dimitrova].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15667) StreamResultFuture check for completeness is inconsistent, leading to races

2020-04-09 Thread Sergio Bossa (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079258#comment-17079258
 ] 

Sergio Bossa commented on CASSANDRA-15667:
--

Thanks [~e.dimitrova] for chiming in.
{quote}From what I recall the bootstrap was sometimes completing too fast 
before the streaming is really interrupted from the byteman code and we didn't 
really have an instrument to control that.
{quote}
I went through the "resumable bootstrap" test, and unfortunately I don't see 
how the bootstrap could ever complete before the byteman script is invoked: 
this is because such script [makes node 1 fail before it starts to stream 
files|[https://github.com/ekaterinadimitrova2/cassandra-dtest/blob/b56887d67c353d6d69cd60cfd74859405fa37685/byteman/4.0/stream_failure.btm#L10]],
 which means there's no way for node 3 to finish bootstrapping before it 
received all files from both nodes, which will never happen due to said script 
causing node 1 to fail.

So why did the test fail?

I believe that's because of this issue: in other words, node 3 was correctly 
seeing its streaming session completed (after node 1 finished streaming with an 
error) but *not* failed; this is because the "completed" state is read through 
the actual session state, while the "failed" state is read through the 
{{SessionInfo}} state, which is what we're fixing here.

That said, I would propose to still re-introduce the original 
{{resumable_bootstrap_test}}, because it's an important enough feature to 
deserve its own test, and it uses 3 nodes which increases the chances of 
detecting errors/races.

Thoughts?

> StreamResultFuture check for completeness is inconsistent, leading to races
> ---
>
> Key: CASSANDRA-15667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15667
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Massimiliano Tomassi
>Priority: Normal
> Fix For: 4.0
>
>
> {{StreamResultFuture#maybeComplete()}} uses 
> {{StreamCoordinator#hasActiveSessions()}} to determine if all sessions are 
> completed, but then accesses each session state via 
> {{StreamCoordinator#getAllSessionInfo()}}: this is inconsistent, as the 
> former relies on the actual {{StreamSession}} state, while the latter on the 
> {{SessionInfo}} state, and the two are concurrently updated with no 
> coordination whatsoever.
> This leads to races, i.e. apparent in some dtest spurious failures, such as 
> {{TestBootstrap.resumable_bootstrap_test}} in CASSANDRA-15614 cc 
> [~e.dimitrova].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15667) StreamResultFuture check for completeness is inconsistent, leading to races

2020-04-07 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077514#comment-17077514
 ] 

Ekaterina Dimitrova commented on CASSANDRA-15667:
-

Just to be sure you saw that - resumable bootstrap is tested in another test 
too, that's why this one was removed, not blindly removed because it was flakey 
or something. Fix was applied to the other one but it was pointed in the ticket 
that in extremely rare cases it might fail. Why? From what I recall the 
bootstrap was sometimes completing too fast before the streaming is really 
interrupted from the byteman code and we didn't really have an instrument to 
control that. So the only fix was to introduce bigger delay which solves the 
issue in most of the cases.  I think I haven't seen this test failing anymore 
but it might be really possible in very rare cases. 

There is in parallel additional ticket to try to implement the test in the 
in-jvm test suite as according to Jordan West who is the Shepard for testing 
4.0,  there we might be able to control the flow deterministically. 

I will look In detail at this patch and the test again later today or tomorrow, 
thanks a lot for bringing it up!


> StreamResultFuture check for completeness is inconsistent, leading to races
> ---
>
> Key: CASSANDRA-15667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15667
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Massimiliano Tomassi
>Priority: Normal
> Fix For: 4.0
>
>
> {{StreamResultFuture#maybeComplete()}} uses 
> {{StreamCoordinator#hasActiveSessions()}} to determine if all sessions are 
> completed, but then accesses each session state via 
> {{StreamCoordinator#getAllSessionInfo()}}: this is inconsistent, as the 
> former relies on the actual {{StreamSession}} state, while the latter on the 
> {{SessionInfo}} state, and the two are concurrently updated with no 
> coordination whatsoever.
> This leads to races, i.e. apparent in some dtest spurious failures, such as 
> {{TestBootstrap.resumable_bootstrap_test}} in CASSANDRA-15614 cc 
> [~e.dimitrova].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15667) StreamResultFuture check for completeness is inconsistent, leading to races

2020-04-06 Thread Massimiliano Tomassi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076236#comment-17076236
 ] 

Massimiliano Tomassi commented on CASSANDRA-15667:
--

PR: [https://github.com/maxtomassi/cassandra/pull/1]
CircleCI: 
[https://app.circleci.com/pipelines/github/maxtomassi/cassandra?branch=15667-4.0]
 (JVM dtests failed running)

> StreamResultFuture check for completeness is inconsistent, leading to races
> ---
>
> Key: CASSANDRA-15667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15667
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Massimiliano Tomassi
>Priority: Normal
> Fix For: 4.0
>
>
> {{StreamResultFuture#maybeComplete()}} uses 
> {{StreamCoordinator#hasActiveSessions()}} to determine if all sessions are 
> completed, but then accesses each session state via 
> {{StreamCoordinator#getAllSessionInfo()}}: this is inconsistent, as the 
> former relies on the actual {{StreamSession}} state, while the latter on the 
> {{SessionInfo}} state, and the two are concurrently updated with no 
> coordination whatsoever.
> This leads to races, i.e. apparent in some dtest spurious failures, such as 
> {{TestBootstrap.resumable_bootstrap_test}} in CASSANDRA-15614 cc 
> [~e.dimitrova].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org