Hello all,
Today James relies on externally scheduled tasks for it's well behaving.
Non exhaustive example of tasks that *could* be critical to the well
behaving of James and *may/should* be monitored:
- Blob deduplication Garbage collection
- JMAP uploads clean-up
- Cassandra consistency checks/fix
- Spam reports
- Auditing / fixing OpenSearch indexing
- Purging data within the deleted message vault
- ...
What can go wrong:
- The admin did not configure / set up the CRON
- The CRON executes badly
- There is an error running the tasks
- The task is never scheduled because task execution throughtput is to
low, etc...
What I would love:
- A green button if required tasks are well executed
- An orange button if investigation is required because the task is
never scheduled
The overall supervision for James todays revolves around the concept of
healthcheck:
- Periodically run
- Results exposed through the logs
- Callable via HTTP to interoperate with alerting stacks (prometheus /
load-balancer / Zabbix / ...)
- Hopefully one day will be on the first page of James administration
site....
So, the proposal is to implement a healthcheck for supervising task
execution. One would configure
the tasks he expects to run successfully, and a time period in which he
wishes the task to be well executed.
We would would add a configuration properties within
healthcheck.properties for this. Specifying no tasks makes
the check a noop effectively disabling it (default behaviour).
I wishes to contribute such a feature to the James project.
Alternatives I have:
- Do this as part of Linagora products (TMail) if this is
non-consensual within the community
- Propose modularisation for healthchecks, allowing custom health
checks and treating this as an opt-in extension (extensions-jars loading
mechanism).
Thoughts?
Best regards
Benoit
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org