Benoit Tellier created JAMES-3660: ------------------------------------- Summary: Cassandra mailbox creation unstable when high concurency Key: JAMES-3660 URL: https://issues.apache.org/jira/browse/JAMES-3660 Project: James Server Issue Type: Improvement Reporter: Benoit Tellier
org.apache.james.mailbox.cassandra.CassandraMailboxManagerTest$WithBatchSize.creatingConcurrentlyMailboxesWithSameParentShouldNotFail tests is enough to trigger instability on the Apache CI https://ci-builds.apache.org/job/james/job/ApacheJames/job/PR-685/1/ {code:java} Error Message java.lang.RuntimeException: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency SERIAL (1 responses were required but only 0 replica responded) Stacktrace java.util.concurrent.ExecutionException: java.lang.RuntimeException: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency SERIAL (1 responses were required but only 0 replica responded) Caused by: java.lang.RuntimeException: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency SERIAL (1 responses were required but only 0 replica responded) Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency SERIAL (1 responses were required but only 0 replica responded) Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency SERIAL (1 responses were required but only 0 replica responded) Standard Output 11:29:54.751 [ERROR] o.a.j.u.c.ConcurrentTestRunner - Error caught during concurrent testing (iteration 0, threadNumber 1) com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency SERIAL (1 responses were required but only 0 replica responded) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:90) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:65) at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:297) at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:268) at com.datastax.shaded.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88) ... 25 common frames omitted Wrapped by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency SERIAL (1 responses were required but only 0 replica responded) {code} In short, the LWT usage is enough to create contention. Looking closer at the issue, StoreMailboxManager does numerous defensive SERIAL reads (doing empty paxos commits) which ends up further degrading performance and increase contention. I believe removing these defensive reads would make our code more stable. It resulted in faster (x2) test for gConcurrentlyMailboxesWithSameParentShouldNotFail -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org