Benoit Tellier created JAMES-3660:
-------------------------------------

             Summary: Cassandra mailbox creation unstable when high concurency
                 Key: JAMES-3660
                 URL: https://issues.apache.org/jira/browse/JAMES-3660
             Project: James Server
          Issue Type: Improvement
            Reporter: Benoit Tellier


org.apache.james.mailbox.cassandra.CassandraMailboxManagerTest$WithBatchSize.creatingConcurrentlyMailboxesWithSameParentShouldNotFail

tests is enough to trigger instability on the Apache CI

https://ci-builds.apache.org/job/james/job/ApacheJames/job/PR-685/1/

{code:java}
Error Message

java.lang.RuntimeException: 
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout 
during read query at consistency SERIAL (1 responses were required but only 0 
replica responded)

Stacktrace

java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout 
during read query at consistency SERIAL (1 responses were required but only 0 
replica responded)
Caused by: java.lang.RuntimeException: 
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout 
during read query at consistency SERIAL (1 responses were required but only 0 
replica responded)
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra 
timeout during read query at consistency SERIAL (1 responses were required but 
only 0 replica responded)
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra 
timeout during read query at consistency SERIAL (1 responses were required but 
only 0 replica responded)

Standard Output

11:29:54.751 [ERROR] o.a.j.u.c.ConcurrentTestRunner - Error caught during 
concurrent testing (iteration 0, threadNumber 1)
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout 
during read query at consistency SERIAL (1 responses were required but only 0 
replica responded)
        at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:90)
        at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:65)
        at 
com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:297)
        at 
com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:268)
        at 
com.datastax.shaded.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
        ... 25 common frames omitted
Wrapped by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra 
timeout during read query at consistency SERIAL (1 responses were required but 
only 0 replica responded)
{code}

In short, the LWT usage is enough to create contention.

Looking closer at the issue, StoreMailboxManager does numerous defensive SERIAL 
reads (doing empty paxos commits) which ends up further degrading performance 
and increase contention.

I believe removing these defensive reads would make our code more stable.

It resulted in faster (x2) test for 
gConcurrentlyMailboxesWithSameParentShouldNotFail
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to