[jira] [Commented] (JAMES-2080) ES mapping: avoid using nested and use object if this affect performance

2022-05-05 Thread Benoit Tellier (Jira)


[ 
https://issues.apache.org/jira/browse/JAMES-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17532154#comment-17532154
 ] 

Benoit Tellier commented on JAMES-2080:
---

For the header field, we query things in a key, value fashion and we relied on 
nested documents to do that without dynamic mappings.

As stated above Nested documents likely have major implications on dataset size 
and indexation timings.

Today I found 
https://www.elastic.co/guide/en/elasticsearch/reference/current/flattened.html 
that might be used as an alternative. 

We should evaluate its size impact and indexation time impact.

> ES mapping: avoid using nested and use object if this affect performance
> 
>
> Key: JAMES-2080
> URL: https://issues.apache.org/jira/browse/JAMES-2080
> Project: James Server
>  Issue Type: Improvement
>  Components: elasticsearch
>Reporter: Luc DUZAN
>Priority: Major
>
> This ticket should be done after 
> https://issues.apache.org/jira/browse/JAMES-2078.
> On our mapping we use nested for header, from, cc, bcc. We know theoretically 
> that nested do reduce performance (creation of invisible document to handle 
> nested value) so when possible object should be used instead.
> In a first time, you should monitor how important the performance is. If the 
> performance lost introduced by nested is significant then, you should 
> estimate and found a work around about the lost of information see:
> * https://www.elastic.co/guide/en/elasticsearch/reference/2.2/nested.html
> * https://www.elastic.co/guide/en/elasticsearch/reference/2.2/object.html
> For the moment, we think this lost of information is not a issue for FROM, 
> CC, BCC.
> But for sure, it will be a issue for headers. A way to work arround it would 
> be to transform the following:
> { headers: [{key: "key1", value: ["value1", "value2"]}, {key: "key2", value: 
> "something"}}
> To that:
> { headers: ["key1:value1", "key1:value2", "key2:something"] }
> But reflexion need to be done too see if this will work for the kind of query 
> we need to do in the headers.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org



[jira] [Commented] (JAMES-2080) ES mapping: avoid using nested and use object if this affect performance

2019-10-14 Thread Benoit Tellier (Jira)


[ 
https://issues.apache.org/jira/browse/JAMES-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951548#comment-16951548
 ] 

Benoit Tellier commented on JAMES-2080:
---

I emmit doubt this changeset 
(https://github.com/linagora/james-project/pull/2736) is worth it.

We conducted performance tests against it and here are the results

```
22.51 message indexed per second
6604 B per message (mean)
Mean default search 107 ms
P99 default search 397 ms
```

As a record master branch was:

```
Reindexed/s:  22.46
Size per message: 5676 B
Mean default search: 108 ms
P99 default search: 357 ms
```

As off today there is no clear evidence, given the tests conducted, that this 
is an improvement.

> ES mapping: avoid using nested and use object if this affect performance
> 
>
> Key: JAMES-2080
> URL: https://issues.apache.org/jira/browse/JAMES-2080
> Project: James Server
>  Issue Type: Improvement
>  Components: elasticsearch
>Reporter: Luc DUZAN
>Priority: Major
>
> This ticket should be done after 
> https://issues.apache.org/jira/browse/JAMES-2078.
> On our mapping we use nested for header, from, cc, bcc. We know theoretically 
> that nested do reduce performance (creation of invisible document to handle 
> nested value) so when possible object should be used instead.
> In a first time, you should monitor how important the performance is. If the 
> performance lost introduced by nested is significant then, you should 
> estimate and found a work around about the lost of information see:
> * https://www.elastic.co/guide/en/elasticsearch/reference/2.2/nested.html
> * https://www.elastic.co/guide/en/elasticsearch/reference/2.2/object.html
> For the moment, we think this lost of information is not a issue for FROM, 
> CC, BCC.
> But for sure, it will be a issue for headers. A way to work arround it would 
> be to transform the following:
> { headers: [{key: "key1", value: ["value1", "value2"]}, {key: "key2", value: 
> "something"}}
> To that:
> { headers: ["key1:value1", "key1:value2", "key2:something"] }
> But reflexion need to be done too see if this will work for the kind of query 
> we need to do in the headers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org



[jira] [Commented] (JAMES-2080) ES mapping: avoid using nested and use object if this affect performance

2019-10-03 Thread Benoit Tellier (Jira)


[ 
https://issues.apache.org/jira/browse/JAMES-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944224#comment-16944224
 ] 

Benoit Tellier commented on JAMES-2080:
---

It turns out nested fields for address headers are not needed. See 
https://github.com/linagora/james-project/pull/2736

> ES mapping: avoid using nested and use object if this affect performance
> 
>
> Key: JAMES-2080
> URL: https://issues.apache.org/jira/browse/JAMES-2080
> Project: James Server
>  Issue Type: Improvement
>  Components: elasticsearch
>Reporter: Luc DUZAN
>Priority: Major
>
> This ticket should be done after 
> https://issues.apache.org/jira/browse/JAMES-2078.
> On our mapping we use nested for header, from, cc, bcc. We know theoretically 
> that nested do reduce performance (creation of invisible document to handle 
> nested value) so when possible object should be used instead.
> In a first time, you should monitor how important the performance is. If the 
> performance lost introduced by nested is significant then, you should 
> estimate and found a work around about the lost of information see:
> * https://www.elastic.co/guide/en/elasticsearch/reference/2.2/nested.html
> * https://www.elastic.co/guide/en/elasticsearch/reference/2.2/object.html
> For the moment, we think this lost of information is not a issue for FROM, 
> CC, BCC.
> But for sure, it will be a issue for headers. A way to work arround it would 
> be to transform the following:
> { headers: [{key: "key1", value: ["value1", "value2"]}, {key: "key2", value: 
> "something"}}
> To that:
> { headers: ["key1:value1", "key1:value2", "key2:something"] }
> But reflexion need to be done too see if this will work for the kind of query 
> we need to do in the headers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org