Hello all,

Recently, as part of our work documenting Administration Procedures for
the Distributed Guice James product, we are having some reflections
regarding the way to conduct monitoring, which undertook some nice
discussions.

Currently, monitoring of `mailbox event processing` and `mail
processing` can be achieved via logs (ie ERROR log review, etc..)

However, logs requires correct kibana configuration which means also
good information. But:
 - It makes retries/final-try non trivial to distinguish
 - Admin generally monotor logs using a time-window. Events older than
this time window are ignored.

We can think of several mechanisms to enhance this matter of fact:

 - Having for instance a health check, like
MailboxEventProcessingHealthCheck ensuring that dead-letter is empty, or
returning "degraded" otherwize
 - Having a metric displayed in a board. For the dead-letter exemple, a
boolean text field can be enough.

While interesting, the health check options received the following
critics so far:
 - A perfectly behaving James server might report some failed processing
entries (for example on some border line EML parsing), leading to a
degraded status of an overwize perfectly working James server (for both
the mail processing and mailbox processing case)
 - Through grafana, the admin will have the information directly
available. Nowaday, health-checks requires her to execute the
healthcheck via webadmin. More actions is generally the best way of
having none of them taken.

We would be very interested by feedback on this topic, in order to get a
friendlyer admin experience.

Best regards,

Benoit


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to