Distributed James: make ElasticSearch indexing optional?

Raphaël Ouazana-Sustowski Thu, 11 Jun 2020 09:02:24 -0700

Hi,

Here is a proposal to make ElasticSearch optional in our distributedproduct/flavor/server.


Comments are welcome.


## Why?

Some people have expressed the need of using a distributed James withoutElasticSearch:

- in some comment here: https://issues.apache.org/jira/browse/JAMES-3086

- one of our customers plan to deploy a distributed James server forserving POP3 encrypted emails. This deployment does not rely onsearching features. However as part of current Distributed James serverhe is forced to rely on ElasticSearch email indexing.

This results in wasted resources as maintaining an ElasticSearch clusterto keep up with the volume is expensive.Maintaining an ElasticSearch cluster when not needed is costly atseveral levels:

- cost of infrastructure to deploy it
- cost of people having to maintain it
- performance cost on James to unnecessarily index data

## How ?

Scanning search is a search implementation that is running on top of anymailbox implementation, even distributed ones and does not require toindex data.

Scanning Search is tested both at the component level (unit test) butalso passes IMAP (MPT) tests on top of Cassandra implementation, as wellas JMAP memory tests, thus delivers correct results. Of course it doesnot support full text search.

We should allow Distributed James to optionally rely on scanning searchinstead of ElasticSearch.


 - Scanning search should be advised for deployments rarely searching data

- ElasticSearch should be advised when search is frequent or requireshigh performance

We could use module choosing [1] to choose between scanning search andElasticSearch.

To be noted that scanning search introduces no other dependencies as itis part of mailbox-store thus causes no risk of library clashes.

To be noted also that metric collection and log collection usingElasticSearch is unaffected.


## Alternative

The alternative would be to build a different product/flavor/server thanthe distributed one, where the only difference with the distributed oneis that indexing will rely on scanning instead of ElasticSearch.

The maintenance cost of such a product/flavor/server is higher than of aconfiguration option (Docker images to release, time and energy to runintegration tests on it).

Such a product/flavor is hard to brand because even if it answers aneed, it is not so far of the distributed one, and does not answer needsthat are very far from it neither.

The advantage is that is would allow to more fine tune this solution toanswer to the exact needs.


## Work in Progress

See pull request: https://github.com/linagora/james-project/pull/3425

Regards,

Raphaël.

[1]https://github.com/apache/james-project/blob/master/src/adr/0036-against-use-of-conditional-statements-in-guice-modules.md



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Distributed James: make ElasticSearch indexing optional?

Reply via email to