[ https://issues.apache.org/jira/browse/JAMES-3900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Benoit Tellier closed JAMES-3900. --------------------------------- Resolution: Fixed > Running task updates stalled on the Distributed task manager > ------------------------------------------------------------ > > Key: JAMES-3900 > URL: https://issues.apache.org/jira/browse/JAMES-3900 > Project: James Server > Issue Type: Improvement > Components: task > Reporter: Benoit Tellier > Priority: Major > Fix For: 3.8.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Upon performing a long reindexing upon a long reindexing, we were prompted > for the following error: > {code:java} > reactor.core.Exceptions$ErrorCallbackNotImplemented: > com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed out > after PT5S > Caused by: com.datastax.oss.driver.api.core.DriverTimeoutException: Query > timed out after PT5S > at > com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.lambda$scheduleTimeout$1(CqlRequestHandler.java:207) > at > io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715) > at > io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34) > at > io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703) > at > io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790) > at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Unknown Source) > {code} > After which scheduled updates for the task no longer happens. > After investigation the error upon polling updates within SerialTaskManager > are not handled thus cancelling the whole subscription is the default reactor > behaviour. > We likely should manage this error and prevent it from aborting the overall > process. I will propose a PR to be doing just this. > Also, using event sourcing for the updates for managing tasks updates is a > somewhat debatable choice... At one update every 30s a task generating 10KB > of JSON (not uncommon, eg if a task generate a large error report...) running > for a week could easily generate 200MB of data being read at consistency > level SERIAL from Cassandra, which is likely too much of an expectation to be > honest... (not mentionning the *massive* deserialization effort...) > As such, I propose to move polling updates management out of the aggregate, > have dedicate > a dedicated storage API for it. I will likely do it in a follow up of this > ticket... -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org