Hi,
Here is a proposal to make ElasticSearch optional in our distributed
product/flavor/server.
Comments are welcome.
## Why?
Some people have expressed the need of using a distributed James without
ElasticSearch:
- in some comment here: https://issues.apache.org/jira/browse/JAMES-3086
- one of our customers plan to deploy a distributed James server for
serving POP3 encrypted emails. This deployment does not rely on
searching features. However as part of current Distributed James server
he is forced to rely on ElasticSearch email indexing.
This results in wasted resources as maintaining an ElasticSearch cluster
to keep up with the volume is expensive.
Maintaining an ElasticSearch cluster when not needed is costly at
several levels:
- cost of infrastructure to deploy it
- cost of people having to maintain it
- performance cost on James to unnecessarily index data
## How ?
Scanning search is a search implementation that is running on top of any
mailbox implementation, even distributed ones and does not require to
index data.
Scanning Search is tested both at the component level (unit test) but
also passes IMAP (MPT) tests on top of Cassandra implementation, as well
as JMAP memory tests, thus delivers correct results. Of course it does
not support full text search.
We should allow Distributed James to optionally rely on scanning search
instead of ElasticSearch.
- Scanning search should be advised for deployments rarely searching data
- ElasticSearch should be advised when search is frequent or requires
high performance
We could use module choosing [1] to choose between scanning search and
ElasticSearch.
To be noted that scanning search introduces no other dependencies as it
is part of mailbox-store thus causes no risk of library clashes.
To be noted also that metric collection and log collection using
ElasticSearch is unaffected.
## Alternative
The alternative would be to build a different product/flavor/server than
the distributed one, where the only difference with the distributed one
is that indexing will rely on scanning instead of ElasticSearch.
The maintenance cost of such a product/flavor/server is higher than of a
configuration option (Docker images to release, time and energy to run
integration tests on it).
Such a product/flavor is hard to brand because even if it answers a
need, it is not so far of the distributed one, and does not answer needs
that are very far from it neither.
The advantage is that is would allow to more fine tune this solution to
answer to the exact needs.
## Work in Progress
See pull request: https://github.com/linagora/james-project/pull/3425
Regards,
Raphaël.
[1]
https://github.com/apache/james-project/blob/master/src/adr/0036-against-use-of-conditional-statements-in-guice-modules.md
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org