[ 
https://issues.apache.org/jira/browse/MAILBOX-155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tellier Benoit updated MAILBOX-155:
-----------------------------------
    Attachment: MAILBOX-155.patch

This patch contains our implementation for an elasticSearch index.

# **Features**

It is an implementation for a MessageSearch index. It will behave well in a 
distributed context.

Mail are indexed in order to allow other application to get greate search 
result as :

# *Note before running tests*

*Dificulties we encontered*

 - mailbox related events are also indexed ( it allow you to write queries to, 
for instance search for an e-mail in all mailbox of a user ).
 - Attachement are indexed if they are text ( I will submit arbitrary 
attachment tomorrow ).

As we have conflicting depandancies with Apache Lucene module, we made the 
choice to "make things in a diffierent way".

 - We used Jest as a client
 - We developped our own query builder.

Note that because of this problem, running tests require a specific test 
environment, with ElasticSearch configured. That's why I commented some tests.

*Sub project structure*

In the ElasticSearch module you will find :
 - Indexes ( we have 3 different implementations )
     - One that index directly in ElasticSearch
     - One that index threw a Kafka queue ( desynchronize mail processing from 
indexing )
     - One that uses an embedded Kafka ( less infrastructure requirements )

Kafka modules demands to configure the river : 
https://ci.open-paas.org/stash/projects/JWC/repos/kafka-river/browse

You have a Bulk generation unit, a query unit ( deals with converting james 
serch requests into ElasticSearch requests ).

You will find our query builder in dsl folder. It implements only operation we 
need. We were forced to write it as Jest demand to use either String or use 
ElasticSearch one ( that relies on Lucene ).

We need to generate JSON from messages. We did this in store as we thought 
other part of James might need it some day.

We also created an Exception dedicated to an offline ElasticSearch.

*What should I do before running tests?*

Tests are provided for each added component.

To test elasticSearch integration with no Kafka :

requirement : Standalone ElasticSearch
uncomment : 
elasticsearch/src/test/java/org/apache/james/mailbox/elasticsearch/search/index/DirectMessageSearchIndexTest.java

To test ElasticSearch integration with a Kafka :

requirement : Standalone ElasticSearch with configured river, Standalone Kafka
uncomment : 
elasticsearch/src/test/java/org/apache/james/mailbox/elasticsearch/search/index/KafkaMessageSearchIndexTest.java

To test ElasticSearch integration threw EmbeddedKafka :

requirement : ElasticSearch with a configured river
uncomment : 
elasticsearch/src/test/java/org/apache/james/mailbox/elasticsearch/search/index/EmbeddedKafkaSearchIndexTest.java

> Add elasticsearch based search index
> ------------------------------------
>
>                 Key: MAILBOX-155
>                 URL: https://issues.apache.org/jira/browse/MAILBOX-155
>             Project: James Mailbox
>          Issue Type: New Feature
>            Reporter: Norman Maurer
>            Assignee: Norman Maurer
>         Attachments: MAILBOX-155.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to