Hi Benoit,

> Regarding Prometheus setup, we have implemented an optional HTTP
> endpoint. I would recommend you to include the following lines in your
> webadmin.properties:
>
> extensions.routes=org.apache.james.webadmin.dropwizard.MetricsRoutes
>

Thanks!  I'll have a look.


> We should then:
>  - better document this setup
>  - (provide a docker-compose?)
>  - Update the metric boards in /grafana-reporting
>

Definitely since this is currently very hard to discover without being
told. Nothing in the name remotely suggests that this is potentially
compatible with Prometheus pulls and the name MetricsRoute doesn't seem to
be mentioned anywhere in the documentation.

With that said my target deployment[1] would have separate instances for :
 - relay SMTP ( with read access to the DB )
 - webadmin (administrative VPN )
 - mailbox and normal SMTP (user VPN, with read access to the user/domain
db)
because I like to keep my attack surfaces small and the relay must be
exposed to the general internet to be of any use so the less code it runs
the better.
I will have to see if I can configure my assembly to run a
restricted webmin with only the metrics routes.

So my ideal world would have documentation for both a pull and a push :p
I'll try to contribute the push side if I manage to make it work

Would you agree with such an approach?
>

Definitely, as I said I don't like ES for metrics either. A better
alternative is definitely ok.

> Having the metrics output to logs doesn't help much because of the amount
> > of processing required to get anything useful out of it (or I failed to
> > find how to easily ingest it please correct me)
>
> Logs for metrics had been introduced by Matthieu to get metrics out in
> the logs of a performance test platform, without the need to analyses
> the metrics "live".
>

That's interesting information ! Not the part about matthieu (no offense to
either of you), but the part about it not being intended for production
monitoring and possibly useful for bench post-mortems. I would definitely
like to see this mentioned in
https://james.apache.org/server/metrics.html or maybe in
https://james.apache.org/server/monitor-logging.html

We are moving away from JMX for the CLI, I see little reason to
> encourage its use here too.
>
> See https://www.cvedetails.com/cve/CVE-2017-12628/
>

I was under the impression that the prometheus jmx_exporter was deployed as
an agent to be able to access the MBeans directly but it seems to be
scraping from an exposed JMX tcp endpoint and I concur that this is not a
good idea.

Glowroot is an APM, allowing to :
>  - capture slow traces
>  - capture DB queries (globally and on slow traces)
>  - do some flame graphs captures (that might be more or less relevant
> based on the number of captures performed)


Yes and also
* Responsive UI with mobile support
* MBean attribute capture and **charts**
* Configurable alerting
* Historical rollup of all data (1m, 5m, 30m, 4h) with configurable
retention
not to mention a centralized collector which is why I said it overlaps
quite a bit with prometheus

It provides some insight that regular monitoring tools will not give.
>

> I would categorize it more as a developer diagnosis tool (which to me is
> invaluable for performance issues).
>

Do you run it in your production environments in addition to whatever
metrics reporter you otherwise use or only on dev/bench ? Are there any
good practices around it, is there any specific support for it in james ?

Jean

[1] This is not currently the case in my terraform/helm recipe but it is my
target.

Reply via email to