Hey benoit,

First I am happy to see this component raise such interest, and gets so
>
much attention!


For the record, I want to establish that I don't use nor actually like ES
for such use cases so the ES part doesn't really matter to me.
I do care a lot about metrics collection and I generally dislike being tied
to a single system.

However it's very easy to spend loads of time on technical issue that
> delivers little value to end users.
> My Prometheus deployments works well and I don't intend to invest more
> of the topic. I don't think I will beat you on that ;-)
>

Today you (linagora?) are happy with prometheus and the platform I target
supports it so I will be using it too.
What worries me is the total lack of choice, the difficulty to add said
choice in the current james ( and I did try) despite boasting a metrics
collection system that is supposed to have pluggable implementations.
I do consider that metrics collection brings a lot of value to end users,
and that being able to integrate in an existing metrics collection
infrastructure is an important point for james adoption beyond your
deployments :D
So yes, when we are done with the pulsar/blob repository thing I will
probably spend some time to either submit a couple modules for established
and stable protocols that won't require much maintenance beyond the
occasional version bump ( graphite and collectd reporters most likely since
they are 1st party modules of dropwizard metrics ) or to implement
opentelemetry in a way that doesn't require such internal bindings, brings
additional benefits in terms of features such as tracing and is more
actively maintained than dropwizard metrics.

Also, ELK stack is not used for metrics.
>

actually it is,  APMs and metrics ingestion have been part of the elastic
solution for quite a while not that I have actually used them


> > However, there are many monitoring systems based on push instead of pull.
> > The debate has been going for a long time and will probably be going on
> > forever ( google lists 54 million results for 'push vs pull monitoring
> > systems' ) it would be nice for james to support at least one of each
> model
> > even if pull is currently the favored approach.
> >
> > Prometheus is popular these days because it is deployed internally in
> > kubernetes which is at the top of the hype curve but james should
> probably
> > support at least one push based protocol
> You know we `could` support many things it do not mean that we should.
>
> We are a small community with limited means, the maintenance effort most
> of the time falls on the same people.
>
> Our efforts should be directed to things that are useful, not a
> deprecated metric module with alternatives that targets another Elastic
> Search version.
>

Note that I don't push to keep the es module, I push for having at least
one push based alternative to prometheus out of the box if only to
demonstrate/document how plugging a dropwizard metrics reporter in james
can be achieved if you don't use prometheus


> >   or at least document how to plug
> > one in for people who build their own assembly.
> +1 thanks for opening with consensus. (was about to do the same ;-) )
>
> I agree we could have an extension mechanism for metric exporters.
> Ideally underlying metric library should come up with an SPI so we, as
> application writers, don't need to care.\
>

I'm not going so far as to want an SPI for dropwizard metrics reporters, as
I said a couple integrations for the reporters provided by metrics itself
to document how to integrate such a reporter is just fine.
As I wrote above I will eventually contribute such an integration if it has
not been done before I get to it.


> Though, if there's a potential move to open telemetry, then adding our
> own custom extension mechanism is likely not the right thing.
>

I would first attempt to implement the graphite/collectd reporter binding,
then look into opentelemetry since the former is supposed to be less work
than the latter.


> CF
>
> https://github.com/apache/james-project/blob/master/server/apps/distributed-app/docs/modules/ROOT/pages/operate/metrics.adoc
>
> Upon recent documentation on the topic we likely forgot to update legacy
> documentation. That's a fail that will be fixed timely.
>

I have been unable to navigate to that document on the current website :(
I am not sure how an end user is supposed to discover this ...

cheers
jean

Reply via email to