[
https://issues.apache.org/jira/browse/MAILBOX-403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benoit Tellier updated MAILBOX-403:
-----------------------------------
Description:
h2. What
I discovered that the main body part, holding the text of an email, and already
indexed as part of textBody/htmlBody properties, is also indexed as an
attachment.
This behaviour is functionally wrong, as it returns attachment hits for terms
contained in the body of the message.
It also cause a larger index size, meaning more disk costs, and higher
latencies.
h2. Definition of done
Unit tests emonstrating ElasticSearch main bodies are no longer indexed as
attachments.
h2. How
Upon turning children subparts into attachment (flattening) only keep mime
parts that explicitly have a content-disposition (either inline or attachment).
This by the way avoids indexing multiparts as attachments (they were not
filtered out...)
Proposed fix: https://github.com/linagora/james-project/pull/4152
was:
## What
I discovered that the main body part, holding the text of an email, and already
indexed as part of textBody/htmlBody properties, is also indexed as an
attachment.
This behaviour is functionally wrong, as it returns attachment hits for terms
contained in the body of the message.
It also cause a larger index size, meaning more disk costs, and higher
latencies.
## Definition of done
Unit tests emonstrating ElasticSearch main bodies are no longer indexed as
attachments.
## How
Upon turning children subparts into attachment (flattening) only keep mime
parts that explicitly have a content-disposition (either inline or attachment).
This by the way avoids indexing multiparts as attachments (they were not
filtered out...)
> Email main body is also indexed as an attachment
> ------------------------------------------------
>
> Key: MAILBOX-403
> URL: https://issues.apache.org/jira/browse/MAILBOX-403
> Project: James Mailbox
> Issue Type: Bug
> Reporter: Benoit Tellier
> Priority: Major
>
> h2. What
> I discovered that the main body part, holding the text of an email, and
> already indexed as part of textBody/htmlBody properties, is also indexed as
> an attachment.
> This behaviour is functionally wrong, as it returns attachment hits for terms
> contained in the body of the message.
> It also cause a larger index size, meaning more disk costs, and higher
> latencies.
> h2. Definition of done
> Unit tests emonstrating ElasticSearch main bodies are no longer indexed as
> attachments.
> h2. How
> Upon turning children subparts into attachment (flattening) only keep mime
> parts that explicitly have a content-disposition (either inline or
> attachment).
> This by the way avoids indexing multiparts as attachments (they were not
> filtered out...)
> Proposed fix: https://github.com/linagora/james-project/pull/4152
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]