Re: Distributed James: make ElasticSearch indexing optional?

Matthieu Baechler Mon, 15 Jun 2020 07:16:27 -0700

On Mon, 2020-06-15 at 15:30 +0200, Raphaël Ouazana-Sustowski wrote:
> 

[...]


> > > I see many use cases where you would not need search, essentially
> > > based
> > > on automatic mail processing, which is a common James workflow.
> > Does it still make sense to support IMAP at this point? I'm almost
> > sure
> > people would expect REST and/or MQ in this case, don't you think?
> 
> Standard vs non standard API? So yes it can make sense. I won't go 
> further on this topic, because as you told it I don't know exactly
> the 
> need for such a workflow, so if people are interested please
> contribute 
> to this discussion.

I already talked to some people willing to use that very feature. And
regarding IMAP protocol (without search) vs non-standard REST API,
given that I wrote a server talking client IMAP in the past, I would
choose the REST API by far.

[...]

> > > > Does disabling only ES in that context makes sense at all for
> > > > the
> > > > Distributed James *product*?
> > > > 
> > > > Shouldn't we craft a specific Distributed SMTP+POP product
> > > > instead
> > > > that
> > > > would remove all wastes?
> > > It makes sense because it allows to easily go back from one
> > > configuration to the other. Going back and forth between scanning
> > > implementation and ES one is pretty easy.
> > As long as you don't have real users with mails. How long will a
> > full-
> > reindex (that is supposed to be slow according to user complains)
> > take
> > with some Terabytes of emails? Is it what you call "easy"? Because
> > having a Distributed Mail Server without a huge amount of data
> > doesn't
> > make much sense.
> 
> It depends, the Distributed Mail Server currently covers the use case
> of 
> high availability. So it can make sense outside of the big data
> world.

Oh, really? You are saying that somebody would deploy a 3 nodes
Cassandra cluster, a 2 node RabbitMQ cluster, an object storage service
for high availability without having at least a TiB of data?

Given that SMTP has the MX backup feature built-in that allows to deal
with a service downtime?

I don't think this exists. Do you have evidence people are willing to
do that?

> 
> > So, let's be realistic: this switch, while possible with some
> > configuration would be quite hard to handle properly in real world
> > (it
> > requires at least some ops and active monitoring).
> > 
> > > Having a new (potentially optimized) product could be great in
> > > some
> > > cases, but would totally go against this.
> > Can we have arguments?
> > 
> > Bundling too many use cases in a single product is not very
> > appealing
> > to me because I suspect it will become be too complex by doing too
> > many
> > different things, confusing to user because we'll have to explain
> > carefully in which case a specific option make sense, hard to
> > maintain,
> > because it's hard to make good choices when we can't figure out
> > what
> > are our users, etc.
> 
> What's the difference between explaining a configuration option and 
> explaining which product to choose?

In one case you are use-case based, it's easy to reason about.

In the other case you can combine things for whatever reason you want
and thus it's harder to document (and make choices as dev team) because
you don't know how it's used.

>  From my point of view, one product 
> is comfortable. 

As a user or as a developer of the James product?

> You know you have some configuration options that can 
> give you such or such options. Several products make you do the
> right 
> choices at the very beginning of the project, when you don't know 
> exactly your requirements to make the right choices.

You don't know at the beginning you want to use your server without
real users doing search? I'm really in doubt here.

[...]

> 
> Finally the configuration option is already the object of a pull 
> request, and it seems to be really simpler than to have a new
> product 
> (in term of quantity of code and impact of the deployment -- of
> course 
> simplicity is very subjective: for example Guice is simple for you,
> it 
> can be different for other people). 

Yes, of course, it's less efforts to add an option than to create a
dedicated product.

> If in the future a new product makes 
> more sense, reverting this PR and building a product around this
> would 
> not be too much of a burden. This other way would be: coming from 2 
> (potentially incompatible) products to only one would be way harder
> in 
> term of data migration we would have to implement.
> 
> That's also why I think the good choice for now if to add a 
> configuration option.

What would be different? In first case you have to support the feature
for at least one release after deprecation. You would have to document
the migration too: are users that rely on Noop search willing to deploy
a ES cluster just because you want to remove the support?

In second case, you'll have to migrate to a supported product: same
deprecation process, same migration doc, etc.

I don't see how this is different to revert.

What is sure: integrating the option in a release will cost a lot in
the future so we may think about it carefully.

-- Matthieu Baechler


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Re: Distributed James: make ElasticSearch indexing optional?

Reply via email to