[ 
https://issues.apache.org/jira/browse/MAILBOX-403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoit Tellier updated MAILBOX-403:
-----------------------------------
    Description: 
h2. What

I discovered that the main body part, holding the text of an email, and already 
indexed as part of textBody/htmlBody properties, is also indexed as an 
attachment.

This behaviour is functionally wrong, as it returns attachment hits for terms 
contained in the body of the message. 

It also cause a larger index size, meaning more disk costs, and higher 
latencies.

h2. Definition of done

Unit tests emonstrating ElasticSearch main bodies are no longer indexed as 
attachments.

h2. How

Upon turning children subparts into attachment (flattening) only keep mime 
parts that explicitly have a content-disposition (either inline or attachment).

This by the way avoids indexing multiparts as attachments (they were not 
filtered out...)

Proposed fix: https://github.com/linagora/james-project/pull/4152

  was:
## What

I discovered that the main body part, holding the text of an email, and already 
indexed as part of textBody/htmlBody properties, is also indexed as an 
attachment.

This behaviour is functionally wrong, as it returns attachment hits for terms 
contained in the body of the message. 

It also cause a larger index size, meaning more disk costs, and higher 
latencies.

## Definition of done

Unit tests emonstrating ElasticSearch main bodies are no longer indexed as 
attachments.

## How

Upon turning children subparts into attachment (flattening) only keep mime 
parts that explicitly have a content-disposition (either inline or attachment).

This by the way avoids indexing multiparts as attachments (they were not 
filtered out...)


> Email main body is also indexed as an attachment
> ------------------------------------------------
>
>                 Key: MAILBOX-403
>                 URL: https://issues.apache.org/jira/browse/MAILBOX-403
>             Project: James Mailbox
>          Issue Type: Bug
>            Reporter: Benoit Tellier
>            Priority: Major
>
> h2. What
> I discovered that the main body part, holding the text of an email, and 
> already indexed as part of textBody/htmlBody properties, is also indexed as 
> an attachment.
> This behaviour is functionally wrong, as it returns attachment hits for terms 
> contained in the body of the message. 
> It also cause a larger index size, meaning more disk costs, and higher 
> latencies.
> h2. Definition of done
> Unit tests emonstrating ElasticSearch main bodies are no longer indexed as 
> attachments.
> h2. How
> Upon turning children subparts into attachment (flattening) only keep mime 
> parts that explicitly have a content-disposition (either inline or 
> attachment).
> This by the way avoids indexing multiparts as attachments (they were not 
> filtered out...)
> Proposed fix: https://github.com/linagora/james-project/pull/4152



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to