[jira] [Commented] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache
[ https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821743#comment-17821743 ] Alex Petrov commented on CASSANDRA-17401: - [~paulo] I have ran both tests and could not make them fail with this change. I've also looked at the code again and it looks like in cases where we would previously evict, in the new version we would just insert the missing one. That said, I do not see this as a correctness issue (not that it should not be still improved upon), so I would probably focus on 17248. > Race condition in QueryProcessor causes just prepared statement not to be in > the prepared statements cache > -- > > Key: CASSANDRA-17401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17401 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Ivan Senic >Assignee: Jaydeepkumar Chovatia >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > The changes in the > [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638] > method that were introduced in versions *4.0.2* and *3.11.12* can cause a > race condition between two threads trying to concurrently prepare the same > statement. This race condition can cause removing of a prepared statement > from the cache, after one of the threads has received the result of the > prepare and eventually uses MD5Digest to call > [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215]. > The race condition looks like this: > * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 executes eviction of hashes > * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 prepares the statement and caches it > * Thread1 returns the result of the prepare > * Thread2 executes eviction of hashes > * Thread1 tries to execute the prepared statement with the received > MD5Digest, but statement is not in the cache as it was evicted by Thread2 > I tried to reproduce this by using a Java driver, but hitting this case from > a client side is highly unlikely and I can not simulate the needed race > condition. However, we can easily reproduce this in Stargate (details > [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to > QueryProcessor. > Reproducing this in a unit test is fairly easy. I am happy to showcase this > if needed. > Note that the issue can occur only when safeToReturnCached is resolved as > false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache
[ https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821475#comment-17821475 ] Paulo Motta commented on CASSANDRA-17401: - Thanks for the detailed reports and repro steps. I've taken a look and this looks to me to be a legitimate race condition that can cause a re-prepare storm under large concurrency and unlucky timing. My understanding is that [these evict statements|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L735] are not required for the correctness of the upgrade compatibility logic and can be safely removed. Would you have some cycles to confirm this [~ifesdjeen] ? In addition to this, I think there's a pending issue from CASSANDRA-17248 that can leak prepared statements between keyspaces during mixed upgrade mode. Since these issues are in a related area I think it makes sense to address them together (in separate commits) to ensure these changes are tested together. I think the {{PreparedStatementCollisionTest}} suite from [this commit|https://github.com/apache/cassandra/pull/1872/commits/758bc4a89d7ca9d0bfe27e6f41000484724261bc] can help improve the validation coverage of this logic. That change looks correct to me but may need some cleanup. We should probably keep the metric changes out of this to keep the scope of this patch to a minimum. After proper review and validation I think there's value in including these fixes in the final 3.X releases to address these outstanding issues as users will still do upgrade cycles as 5.x release approaches. This will make resolution more laborious as we will need to provide patches for 3.x all the way up to trunk + CI for all branches. What do you think [~brandon.williams] [~stefan.miklosovic] ? > Race condition in QueryProcessor causes just prepared statement not to be in > the prepared statements cache > -- > > Key: CASSANDRA-17401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17401 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Ivan Senic >Assignee: Jaydeepkumar Chovatia >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > The changes in the > [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638] > method that were introduced in versions *4.0.2* and *3.11.12* can cause a > race condition between two threads trying to concurrently prepare the same > statement. This race condition can cause removing of a prepared statement > from the cache, after one of the threads has received the result of the > prepare and eventually uses MD5Digest to call > [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215]. > The race condition looks like this: > * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 executes eviction of hashes > * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 prepares the statement and caches it > * Thread1 returns the result of the prepare > * Thread2 executes eviction of hashes > * Thread1 tries to execute the prepared statement with the received > MD5Digest, but statement is not in the cache as it was evicted by Thread2 > I tried to reproduce this by using a Java driver, but hitting this case from > a client side is highly unlikely and I can not simulate the needed race > condition. However, we can easily reproduce this in Stargate (details > [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to > QueryProcessor. > Reproducing this in a unit test is fairly easy. I am happy to showcase this > if needed. > Note that the issue can occur only when safeToReturnCached is resolved as > false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache
[ https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812033#comment-17812033 ] Paulo Motta commented on CASSANDRA-17401: - [~chovatia.jayd...@gmail.com] I was not able to review this yet, will send an update when I get a chance to review it. If anyone else subscribed wants to review this on the meantime feel free to take it. > Race condition in QueryProcessor causes just prepared statement not to be in > the prepared statements cache > -- > > Key: CASSANDRA-17401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17401 > Project: Cassandra > Issue Type: Bug >Reporter: Ivan Senic >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > The changes in the > [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638] > method that were introduced in versions *4.0.2* and *3.11.12* can cause a > race condition between two threads trying to concurrently prepare the same > statement. This race condition can cause removing of a prepared statement > from the cache, after one of the threads has received the result of the > prepare and eventually uses MD5Digest to call > [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215]. > The race condition looks like this: > * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 executes eviction of hashes > * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 prepares the statement and caches it > * Thread1 returns the result of the prepare > * Thread2 executes eviction of hashes > * Thread1 tries to execute the prepared statement with the received > MD5Digest, but statement is not in the cache as it was evicted by Thread2 > I tried to reproduce this by using a Java driver, but hitting this case from > a client side is highly unlikely and I can not simulate the needed race > condition. However, we can easily reproduce this in Stargate (details > [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to > QueryProcessor. > Reproducing this in a unit test is fairly easy. I am happy to showcase this > if needed. > Note that the issue can occur only when safeToReturnCached is resolved as > false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache
[ https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812009#comment-17812009 ] Jaydeepkumar Chovatia commented on CASSANDRA-17401: --- Hi [~paulo], Would appreciate it if you could spare time to review the pull request. Jaydeep > Race condition in QueryProcessor causes just prepared statement not to be in > the prepared statements cache > -- > > Key: CASSANDRA-17401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17401 > Project: Cassandra > Issue Type: Bug >Reporter: Ivan Senic >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > The changes in the > [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638] > method that were introduced in versions *4.0.2* and *3.11.12* can cause a > race condition between two threads trying to concurrently prepare the same > statement. This race condition can cause removing of a prepared statement > from the cache, after one of the threads has received the result of the > prepare and eventually uses MD5Digest to call > [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215]. > The race condition looks like this: > * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 executes eviction of hashes > * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 prepares the statement and caches it > * Thread1 returns the result of the prepare > * Thread2 executes eviction of hashes > * Thread1 tries to execute the prepared statement with the received > MD5Digest, but statement is not in the cache as it was evicted by Thread2 > I tried to reproduce this by using a Java driver, but hitting this case from > a client side is highly unlikely and I can not simulate the needed race > condition. However, we can easily reproduce this in Stargate (details > [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to > QueryProcessor. > Reproducing this in a unit test is fairly easy. I am happy to showcase this > if needed. > Note that the issue can occur only when safeToReturnCached is resolved as > false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache
[ https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809755#comment-17809755 ] Long Pan commented on CASSANDRA-17401: -- [~chovatia.jayd...@gmail.com] and I managed to reproduce the issue. Basically a large number of QPS and client connections are necessary to reproduce it. Here are the steps to reproduce: *Server Setup:* A 3-node Cassandra cluster. Each node with 64GB mem (16GB heap), 7 CPU cores. ({{{}native_transport_max_threads = 1024){}}} *Keyspace/Table:* CREATE KEYSPACE test_ks WITH REPLICATION = \{ ‘class’ : ‘NetworkTopologyStrategy’, ‘datacenter1’ : 3 } ; CREATE TABLE test_ks.table1 ( p_id text, c_id text, v text); *Client Setup:* 30 hosts. Each host run the following pseudo-code, using *GoCql* client: cluster.CQLVersion = "3.4.0" cluster.ProtoVersion = 4 cluster.Timeout = 5s cluster.ConnectTimeout = 10s cluster.NumConns = 3 cluster.Consistency = LocalQuorum cluster.RetryPolicy = SimpleRetryPolicy\{NumRetries: 1} cluster.SocketKeepalive = 20s cluster.HostSelectionPolicy = RoundRobinHostPolicy sessionCount = 30 qpsPerSession = 30 cqlQuery = "SELECT p_id,c_id,v FROM test_ks.table1 WHERE p_id = ? AND c_id = ?" for (i = 0; i < sessionCount; i++) \{ session = cluster.createSession rateLimiter = NewRateLimiter(qpsPerSession) newGoRoutine.run( sendReads(session, rateLimiter) ) } / * sendReads(session, rateLimiter) \{ for { newGoRoutine.run ( if (rateLimiter.allow) { session.execute(cqlQuery, randomString, randomString) } ) } } */ Traffic generated this way will result in ~10K coordiator QPS and ~3k client connections per Cassandra node. *Trigger Point:* Manually issue a CQL query to add a column in the table: “ALTER TABLE test_ks.table1 ADD new_col text;” {*}Symmpton{*}: Seconds after the trigger point, one or more Cassandra nodes will show number of native_transport threads reaching {{{}native_transport_max_threads{}}}, and pending native transport tasks grow endlessly. > Race condition in QueryProcessor causes just prepared statement not to be in > the prepared statements cache > -- > > Key: CASSANDRA-17401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17401 > Project: Cassandra > Issue Type: Bug >Reporter: Ivan Senic >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > The changes in the > [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638] > method that were introduced in versions *4.0.2* and *3.11.12* can cause a > race condition between two threads trying to concurrently prepare the same > statement. This race condition can cause removing of a prepared statement > from the cache, after one of the threads has received the result of the > prepare and eventually uses MD5Digest to call > [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215]. > The race condition looks like this: > * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 executes eviction of hashes > * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 prepares the statement and caches it > * Thread1 returns the result of the prepare > * Thread2 executes eviction of hashes > * Thread1 tries to execute the prepared statement with the received > MD5Digest, but statement is not in the cache as it was evicted by Thread2 > I tried to reproduce this by using a Java driver, but hitting this case from > a client side is highly unlikely and I can not simulate the needed race > condition. However, we can easily reproduce this in Stargate (details > [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to > QueryProcessor. > Reproducing this in a unit test is fairly easy. I am happy to showcase this > if needed. > Note that the issue can occur only when safeToReturnCached is resolved as > false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache
[ https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809241#comment-17809241 ] Jaydeepkumar Chovatia commented on CASSANDRA-17401: --- Thank you, [~paulo] for your help with this ticket. Yes, it has occurred one time in production. After that, we could successfully reproduce a couple of times by injecting some delay in thread scheduling and by running a very high concurrency on _prepare_ code. > Race condition in QueryProcessor causes just prepared statement not to be in > the prepared statements cache > -- > > Key: CASSANDRA-17401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17401 > Project: Cassandra > Issue Type: Bug >Reporter: Ivan Senic >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > The changes in the > [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638] > method that were introduced in versions *4.0.2* and *3.11.12* can cause a > race condition between two threads trying to concurrently prepare the same > statement. This race condition can cause removing of a prepared statement > from the cache, after one of the threads has received the result of the > prepare and eventually uses MD5Digest to call > [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215]. > The race condition looks like this: > * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 executes eviction of hashes > * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 prepares the statement and caches it > * Thread1 returns the result of the prepare > * Thread2 executes eviction of hashes > * Thread1 tries to execute the prepared statement with the received > MD5Digest, but statement is not in the cache as it was evicted by Thread2 > I tried to reproduce this by using a Java driver, but hitting this case from > a client side is highly unlikely and I can not simulate the needed race > condition. However, we can easily reproduce this in Stargate (details > [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to > QueryProcessor. > Reproducing this in a unit test is fairly easy. I am happy to showcase this > if needed. > Note that the issue can occur only when safeToReturnCached is resolved as > false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache
[ https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809211#comment-17809211 ] Paulo Motta commented on CASSANDRA-17401: - Ok thanks [~chovatia.jayd...@gmail.com]! I'm not familiar with this area but will try to look at it if I find cycles in the next few days and nobody beats me to it. :) Btw did you just observe a single occurrence of this issue, or is it recurrent? > Race condition in QueryProcessor causes just prepared statement not to be in > the prepared statements cache > -- > > Key: CASSANDRA-17401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17401 > Project: Cassandra > Issue Type: Bug >Reporter: Ivan Senic >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > The changes in the > [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638] > method that were introduced in versions *4.0.2* and *3.11.12* can cause a > race condition between two threads trying to concurrently prepare the same > statement. This race condition can cause removing of a prepared statement > from the cache, after one of the threads has received the result of the > prepare and eventually uses MD5Digest to call > [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215]. > The race condition looks like this: > * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 executes eviction of hashes > * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 prepares the statement and caches it > * Thread1 returns the result of the prepare > * Thread2 executes eviction of hashes > * Thread1 tries to execute the prepared statement with the received > MD5Digest, but statement is not in the cache as it was evicted by Thread2 > I tried to reproduce this by using a Java driver, but hitting this case from > a client side is highly unlikely and I can not simulate the needed race > condition. However, we can easily reproduce this in Stargate (details > [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to > QueryProcessor. > Reproducing this in a unit test is fairly easy. I am happy to showcase this > if needed. > Note that the issue can occur only when safeToReturnCached is resolved as > false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache
[ https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809178#comment-17809178 ] Jaydeepkumar Chovatia commented on CASSANDRA-17401: --- Sure, [~paulo] Please note that reproducing this issue is extremely tricky as it depends on the thread contention, timing, etc., so I have injected some delay in preparing the statement code path to reproduce. If we run this test case on 3.0.25 or before, it does not reproduce, i.e., it is a regression since 3.0.26. Here is the PR that reproduces this issue: [https://github.com/apache/cassandra/pull/3058] Here is the proposed fix: [https://github.com/apache/cassandra/pull/3059] Please take a look at it and let me know your comments. > Race condition in QueryProcessor causes just prepared statement not to be in > the prepared statements cache > -- > > Key: CASSANDRA-17401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17401 > Project: Cassandra > Issue Type: Bug >Reporter: Ivan Senic >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > The changes in the > [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638] > method that were introduced in versions *4.0.2* and *3.11.12* can cause a > race condition between two threads trying to concurrently prepare the same > statement. This race condition can cause removing of a prepared statement > from the cache, after one of the threads has received the result of the > prepare and eventually uses MD5Digest to call > [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215]. > The race condition looks like this: > * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 executes eviction of hashes > * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 prepares the statement and caches it > * Thread1 returns the result of the prepare > * Thread2 executes eviction of hashes > * Thread1 tries to execute the prepared statement with the received > MD5Digest, but statement is not in the cache as it was evicted by Thread2 > I tried to reproduce this by using a Java driver, but hitting this case from > a client side is highly unlikely and I can not simulate the needed race > condition. However, we can easily reproduce this in Stargate (details > [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to > QueryProcessor. > Reproducing this in a unit test is fairly easy. I am happy to showcase this > if needed. > Note that the issue can occur only when safeToReturnCached is resolved as > false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache
[ https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809139#comment-17809139 ] Paulo Motta commented on CASSANDRA-17401: - Hi [~chovatia.jayd...@gmail.com] can you provide a regression test case reproducing this issue and a patch with a proposed fix ? > Race condition in QueryProcessor causes just prepared statement not to be in > the prepared statements cache > -- > > Key: CASSANDRA-17401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17401 > Project: Cassandra > Issue Type: Bug >Reporter: Ivan Senic >Priority: Normal > > The changes in the > [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638] > method that were introduced in versions *4.0.2* and *3.11.12* can cause a > race condition between two threads trying to concurrently prepare the same > statement. This race condition can cause removing of a prepared statement > from the cache, after one of the threads has received the result of the > prepare and eventually uses MD5Digest to call > [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215]. > The race condition looks like this: > * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 executes eviction of hashes > * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 prepares the statement and caches it > * Thread1 returns the result of the prepare > * Thread2 executes eviction of hashes > * Thread1 tries to execute the prepared statement with the received > MD5Digest, but statement is not in the cache as it was evicted by Thread2 > I tried to reproduce this by using a Java driver, but hitting this case from > a client side is highly unlikely and I can not simulate the needed race > condition. However, we can easily reproduce this in Stargate (details > [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to > QueryProcessor. > Reproducing this in a unit test is fairly easy. I am happy to showcase this > if needed. > Note that the issue can occur only when safeToReturnCached is resolved as > false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache
[ https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17808931#comment-17808931 ] Jaydeepkumar Chovatia commented on CASSANDRA-17401: --- We have been facing the exact same problem in our production environment. As part of CASSANDRA-17248 (in C* 3.0.26), the following code was introduced in [QueryProcessor.java|https://github.com/apache/cassandra/commit/242f7f9b18db77bce36c9bba00b2acda4ff3209e#r137491766] {code:java} // Make sure the missing one is going to be eventually re-prepared evictPrepared(hashWithKeyspace); evictPrepared(hashWithoutKeyspace); {code} This code could very well create a race condition between two calls of [_QueryProcessor::prepare_|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575] call in which one thread is adding and another thread is silently removing from the cache. Imagine there are thousands of threads calling the API, and then it might be possible for one thread to update the cache and another thread to remove it. If we look at the code of [Cassandra 3.0.25|https://github.com/apache/cassandra/blob/cassandra-3.0.25/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L391], then such eviction was not present. Hence, this seems like a regression since the 3.0.26 version of the Cassandra. To fix this issue, we should not *evict* the cache entries, i.e., the above-mentioned code path introduced since C* 3.0.26 is no longer required. [~ifesdjeen] , Could you please take a look at it? > Race condition in QueryProcessor causes just prepared statement not to be in > the prepared statements cache > -- > > Key: CASSANDRA-17401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17401 > Project: Cassandra > Issue Type: Bug >Reporter: Ivan Senic >Priority: Normal > > The changes in the > [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638] > method that were introduced in versions *4.0.2* and *3.11.12* can cause a > race condition between two threads trying to concurrently prepare the same > statement. This race condition can cause removing of a prepared statement > from the cache, after one of the threads has received the result of the > prepare and eventually uses MD5Digest to call > [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215]. > The race condition looks like this: > * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 executes eviction of hashes > * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 prepares the statement and caches it > * Thread1 returns the result of the prepare > * Thread2 executes eviction of hashes > * Thread1 tries to execute the prepared statement with the received > MD5Digest, but statement is not in the cache as it was evicted by Thread2 > I tried to reproduce this by using a Java driver, but hitting this case from > a client side is highly unlikely and I can not simulate the needed race > condition. However, we can easily reproduce this in Stargate (details > [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to > QueryProcessor. > Reproducing this in a unit test is fairly easy. I am happy to showcase this > if needed. > Note that the issue can occur only when safeToReturnCached is resolved as > false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache
[ https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497383#comment-17497383 ] Ivan Senic commented on CASSANDRA-17401: [~ifesdjeen] [~e.dimitrova] You can close the issue. I do understand the on execution of the prepared statement it will be re-prepared if the cache does not contain it. We ll adapt usage in Stargate. > Race condition in QueryProcessor causes just prepared statement not to be in > the prepared statements cache > -- > > Key: CASSANDRA-17401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17401 > Project: Cassandra > Issue Type: Bug >Reporter: Ivan Senic >Priority: Normal > > The changes in the > [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638] > method that were introduced in versions *4.0.2* and *3.11.12* can cause a > race condition between two threads trying to concurrently prepare the same > statement. This race condition can cause removing of a prepared statement > from the cache, after one of the threads has received the result of the > prepare and eventually uses MD5Digest to call > [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215]. > The race condition looks like this: > * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 executes eviction of hashes > * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 prepares the statement and caches it > * Thread1 returns the result of the prepare > * Thread2 executes eviction of hashes > * Thread1 tries to execute the prepared statement with the received > MD5Digest, but statement is not in the cache as it was evicted by Thread2 > I tried to reproduce this by using a Java driver, but hitting this case from > a client side is highly unlikely and I can not simulate the needed race > condition. However, we can easily reproduce this in Stargate (details > [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to > QueryProcessor. > Reproducing this in a unit test is fairly easy. I am happy to showcase this > if needed. > Note that the issue can occur only when safeToReturnCached is resolved as > false. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache
[ https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496917#comment-17496917 ] Tatu Saloranta commented on CASSANDRA-17401: It seems to me that returning `null` – if that is to be expected – definitely needs to be well documented. It would also seem quite awkward for caller to essentially have to retry the operation, theoretically multiple times (more than 2 threads). Or perhaps I have misunderstood something. > Race condition in QueryProcessor causes just prepared statement not to be in > the prepared statements cache > -- > > Key: CASSANDRA-17401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17401 > Project: Cassandra > Issue Type: Bug >Reporter: Ivan Senic >Priority: Normal > > The changes in the > [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638] > method that were introduced in versions *4.0.2* and *3.11.12* can cause a > race condition between two threads trying to concurrently prepare the same > statement. This race condition can cause removing of a prepared statement > from the cache, after one of the threads has received the result of the > prepare and eventually uses MD5Digest to call > [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215]. > The race condition looks like this: > * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 executes eviction of hashes > * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 prepares the statement and caches it > * Thread1 returns the result of the prepare > * Thread2 executes eviction of hashes > * Thread1 tries to execute the prepared statement with the received > MD5Digest, but statement is not in the cache as it was evicted by Thread2 > I tried to reproduce this by using a Java driver, but hitting this case from > a client side is highly unlikely and I can not simulate the needed race > condition. However, we can easily reproduce this in Stargate (details > [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to > QueryProcessor. > Reproducing this in a unit test is fairly easy. I am happy to showcase this > if needed. > Note that the issue can occur only when safeToReturnCached is resolved as > false. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache
[ https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496911#comment-17496911 ] Ivan Senic commented on CASSANDRA-17401: [~ifesdjeen] I have to admin that internals of Cassandra are not 100% familiar to me. However, wouldn't there be a _PreparedQueryNotFoundException_ thrown in the _ExecuteMessage_ or the _BatchMessage_ if the _QueryHandler.getPrepared_ returns null? I don't know how this is exception handled or how is the re-preparing actually working. Maybe I was not clear in my explanation above, but bottom line is that if the race condition is uncovering the issue described, the _QueryHandler.getPrepared_ will return null for the MD5 that was just returned by the same class prepare method call. If you think this is acceptable, then you can close the issue. > Race condition in QueryProcessor causes just prepared statement not to be in > the prepared statements cache > -- > > Key: CASSANDRA-17401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17401 > Project: Cassandra > Issue Type: Bug >Reporter: Ivan Senic >Priority: Normal > > The changes in the > [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638] > method that were introduced in versions *4.0.2* and *3.11.12* can cause a > race condition between two threads trying to concurrently prepare the same > statement. This race condition can cause removing of a prepared statement > from the cache, after one of the threads has received the result of the > prepare and eventually uses MD5Digest to call > [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215]. > The race condition looks like this: > * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 executes eviction of hashes > * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 prepares the statement and caches it > * Thread1 returns the result of the prepare > * Thread2 executes eviction of hashes > * Thread1 tries to execute the prepared statement with the received > MD5Digest, but statement is not in the cache as it was evicted by Thread2 > I tried to reproduce this by using a Java driver, but hitting this case from > a client side is highly unlikely and I can not simulate the needed race > condition. However, we can easily reproduce this in Stargate (details > [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to > QueryProcessor. > Reproducing this in a unit test is fairly easy. I am happy to showcase this > if needed. > Note that the issue can occur only when safeToReturnCached is resolved as > false. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache
[ https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496878#comment-17496878 ] Alex Petrov commented on CASSANDRA-17401: - [~ivansenic] thank you for reporting this. While I can imagine how something like this may happen, I do not understand how it is bad or dangerous, could you elaborate? It is not in the cache, so it will get re-prepared. I'm fairly certain the fuzz test included with the patch covers the this. What specifically are you suggesting to do, since the issue just describes a behaviour and calls it a bug but gives little detail besides that. > Race condition in QueryProcessor causes just prepared statement not to be in > the prepared statements cache > -- > > Key: CASSANDRA-17401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17401 > Project: Cassandra > Issue Type: Bug >Reporter: Ivan Senic >Priority: Normal > > The changes in the > [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638] > method that were introduced in versions *4.0.2* and *3.11.12* can cause a > race condition between two threads trying to concurrently prepare the same > statement. This race condition can cause removing of a prepared statement > from the cache, after one of the threads has received the result of the > prepare and eventually uses MD5Digest to call > [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215]. > The race condition looks like this: > * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 executes eviction of hashes > * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 prepares the statement and caches it > * Thread1 returns the result of the prepare > * Thread2 executes eviction of hashes > * Thread1 tries to execute the prepared statement with the received > MD5Digest, but statement is not in the cache as it was evicted by Thread2 > I tried to reproduce this by using a Java driver, but hitting this case from > a client side is highly unlikely and I can not simulate the needed race > condition. However, we can easily reproduce this in Stargate (details > [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to > QueryProcessor. > > Reproducing this in a unit test is fairly easy. I am happy to showcase this > if needed. > > Note that the issue can occur only when safeToReturnCached is resolved as > false. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache
[ https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496820#comment-17496820 ] Ekaterina Dimitrova commented on CASSANDRA-17401: - [~ifesdjeen] , [~marcuse] , do you mind to take a look, please? Looking at the links provided, seems related to the issues you were working on? > Race condition in QueryProcessor causes just prepared statement not to be in > the prepared statements cache > -- > > Key: CASSANDRA-17401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17401 > Project: Cassandra > Issue Type: Bug >Reporter: Ivan Senic >Priority: Normal > > The changes in the > [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638] > method that were introduced in versions *4.0.2* and *3.11.12* can cause a > race condition between two threads trying to concurrently prepare the same > statement. This race condition can cause removing of a prepared statement > from the cache, after one of the threads has received the result of the > prepare and eventually uses MD5Digest to call > [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215]. > The race condition looks like this: > * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 executes eviction of hashes > * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 prepares the statement and caches it > * Thread1 returns the result of the prepare > * Thread2 executes eviction of hashes > * Thread1 tries to execute the prepared statement with the received > MD5Digest, but statement is not in the cache as it was evicted by Thread2 > I tried to reproduce this by using a Java driver, but hitting this case from > a client side is highly unlikely and I can not simulate the needed race > condition. However, we can easily reproduce this in Stargate (details > [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to > QueryProcessor. > > Reproducing this in a unit test is fairly easy. I am happy to showcase this > if needed. > > Note that the issue can occur only when safeToReturnCached is resolved as > false. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org