+1 On 11/06/2020 23:01, Raphaël Ouazana-Sustowski wrote: > Hi, > > Here is a proposal to make ElasticSearch optional in our distributed > product/flavor/server. > > Comments are welcome. > > > ## Why? > > Some people have expressed the need of using a distributed James without > ElasticSearch: > - in some comment here: https://issues.apache.org/jira/browse/JAMES-3086 > - one of our customers plan to deploy a distributed James server for > serving POP3 encrypted emails. This deployment does not rely on > searching features. However as part of current Distributed James server > he is forced to rely on ElasticSearch email indexing. > > This results in wasted resources as maintaining an ElasticSearch cluster > to keep up with the volume is expensive. > Maintaining an ElasticSearch cluster when not needed is costly at > several levels: > - cost of infrastructure to deploy it > - cost of people having to maintain it > - performance cost on James to unnecessarily index data > > ## How ? > > Scanning search is a search implementation that is running on top of any > mailbox implementation, even distributed ones and does not require to > index data. > > Scanning Search is tested both at the component level (unit test) but > also passes IMAP (MPT) tests on top of Cassandra implementation, as well > as JMAP memory tests, thus delivers correct results. Of course it does > not support full text search. > > We should allow Distributed James to optionally rely on scanning search > instead of ElasticSearch. > > - Scanning search should be advised for deployments rarely searching data > - ElasticSearch should be advised when search is frequent or requires > high performance > > We could use module choosing [1] to choose between scanning search and > ElasticSearch. > > To be noted that scanning search introduces no other dependencies as it > is part of mailbox-store thus causes no risk of library clashes. > > To be noted also that metric collection and log collection using > ElasticSearch is unaffected. > > ## Alternative > > The alternative would be to build a different product/flavor/server than > the distributed one, where the only difference with the distributed one > is that indexing will rely on scanning instead of ElasticSearch. > > The maintenance cost of such a product/flavor/server is higher than of a > configuration option (Docker images to release, time and energy to run > integration tests on it). > > Such a product/flavor is hard to brand because even if it answers a > need, it is not so far of the distributed one, and does not answer needs > that are very far from it neither. > > The advantage is that is would allow to more fine tune this solution to > answer to the exact needs. > > ## Work in Progress > > See pull request: https://github.com/linagora/james-project/pull/3425 > > Regards, > > Raphaël. > > > > [1] > https://github.com/apache/james-project/blob/master/src/adr/0036-against-use-of-conditional-statements-in-guice-modules.md > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] >
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
