Benoit Tellier created JAMES-3900: ------------------------------------- Summary: Running task updates stalled on the Distributed task manager Key: JAMES-3900 URL: https://issues.apache.org/jira/browse/JAMES-3900 Project: James Server Issue Type: Improvement Components: task Reporter: Benoit Tellier Fix For: 3.8.0
Upon performing a long reindexing upon a long reindexing, we were prompted for the following error: {code:java} reactor.core.Exceptions$ErrorCallbackNotImplemented: com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed out after PT5S Caused by: com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed out after PT5S at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.lambda$scheduleTimeout$1(CqlRequestHandler.java:207) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715) at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703) at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790) at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Unknown Source) {code} After which scheduled updates for the task no longer happens. After investigation the error upon polling updates within SerialTaskManager are not handled thus cancelling the whole subscription is the default reactor behaviour. We likely should manage this error and prevent it from aborting the overall process. I will propose a PR to be doing just this. Also, using event sourcing for the updates for managing tasks updates is a somewhat debatable choice... At one update every 30s a task generating 10KB of JSON (not uncommon, eg if a task generate a large error report...) running for a week could easily generate 200MB of data being read at consistency level SERIAL from Cassandra, which is likely too much of an expectation to be honest... (not mentionning the *massive* deserialization effort...) As such, I propose to move polling updates management out of the aggregate, have dedicate a dedicated storage API for it. I will likely do it in a follow up of this ticket... -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org