Re: Distributed James: make ElasticSearch indexing optional?

Tellier Benoit Thu, 11 Jun 2020 19:06:29 -0700

+1

On 11/06/2020 23:01, Raphaël Ouazana-Sustowski wrote:
> Hi,
> 
> Here is a proposal to make ElasticSearch optional in our distributed
> product/flavor/server.
> 
> Comments are welcome.
> 
> 
> ## Why?
> 
> Some people have expressed the need of using a distributed James without
> ElasticSearch:
> - in some comment here: https://issues.apache.org/jira/browse/JAMES-3086
> - one of our customers plan to deploy a distributed James server for
> serving POP3 encrypted emails. This deployment does not rely on
> searching features. However as part of current Distributed James server
> he is forced to rely on ElasticSearch email indexing.
> 
> This results in wasted resources as maintaining an ElasticSearch cluster
> to keep up with the volume is expensive.
> Maintaining an ElasticSearch cluster when not needed is costly at
> several levels:
> - cost of infrastructure to deploy it
> - cost of people having to maintain it
> - performance cost on James to unnecessarily index data
> 
> ## How ?
> 
> Scanning search is a search implementation that is running on top of any
> mailbox implementation, even distributed ones and does not require to
> index data.
> 
> Scanning Search is tested both at the component level (unit test) but
> also passes IMAP (MPT) tests on top of Cassandra implementation, as well
> as JMAP memory tests, thus delivers correct results. Of course it does
> not support full text search.
> 
> We should allow Distributed James to optionally rely on scanning search
> instead of ElasticSearch.
> 
>  - Scanning search should be advised for deployments rarely searching data
>  - ElasticSearch should be advised when search is frequent or requires
> high performance
> 
> We could use module choosing [1] to choose between scanning search and
> ElasticSearch.
> 
> To be noted that scanning search introduces no other dependencies as it
> is part of mailbox-store thus causes no risk of library clashes.
> 
> To be noted also that metric collection and log collection using
> ElasticSearch is unaffected.
> 
> ## Alternative
> 
> The alternative would be to build a different product/flavor/server than
> the distributed one, where the only difference with the distributed one
> is that indexing will rely on scanning instead of ElasticSearch.
> 
> The maintenance cost of such a product/flavor/server is higher than of a
> configuration option (Docker images to release, time and energy to run
> integration tests on it).
> 
> Such a product/flavor is hard to brand because even if it answers a
> need, it is not so far of the distributed one, and does not answer needs
> that are very far from it neither.
> 
> The advantage is that is would allow to more fine tune this solution to
> answer to the exact needs.
> 
> ## Work in Progress
> 
> See pull request: https://github.com/linagora/james-project/pull/3425
> 
> Regards,
> 
> Raphaël.
> 
> 
> 
> [1]
> https://github.com/apache/james-project/blob/master/src/adr/0036-against-use-of-conditional-statements-in-guice-modules.md
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Distributed James: make ElasticSearch indexing optional?

Reply via email to