[
https://issues.apache.org/jira/browse/JAMES-3900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benoit Tellier closed JAMES-3900.
---------------------------------
Resolution: Fixed
> Running task updates stalled on the Distributed task manager
> ------------------------------------------------------------
>
> Key: JAMES-3900
> URL: https://issues.apache.org/jira/browse/JAMES-3900
> Project: James Server
> Issue Type: Improvement
> Components: task
> Reporter: Benoit Tellier
> Priority: Major
> Fix For: 3.8.0
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> Upon performing a long reindexing upon a long reindexing, we were prompted
> for the following error:
> {code:java}
> reactor.core.Exceptions$ErrorCallbackNotImplemented:
> com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed out
> after PT5S
> Caused by: com.datastax.oss.driver.api.core.DriverTimeoutException: Query
> timed out after PT5S
> at
> com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.lambda$scheduleTimeout$1(CqlRequestHandler.java:207)
> at
> io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
> at
> io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
> at
> io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
> at
> io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
> at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Unknown Source)
> {code}
> After which scheduled updates for the task no longer happens.
> After investigation the error upon polling updates within SerialTaskManager
> are not handled thus cancelling the whole subscription is the default reactor
> behaviour.
> We likely should manage this error and prevent it from aborting the overall
> process. I will propose a PR to be doing just this.
> Also, using event sourcing for the updates for managing tasks updates is a
> somewhat debatable choice... At one update every 30s a task generating 10KB
> of JSON (not uncommon, eg if a task generate a large error report...) running
> for a week could easily generate 200MB of data being read at consistency
> level SERIAL from Cassandra, which is likely too much of an expectation to be
> honest... (not mentionning the *massive* deserialization effort...)
> As such, I propose to move polling updates management out of the aggregate,
> have dedicate
> a dedicated storage API for it. I will likely do it in a follow up of this
> ticket...
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]