[
https://issues.apache.org/jira/browse/MAILBOX-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15724029#comment-15724029
]
Tellier Benoit commented on MAILBOX-280:
----------------------------------------
This behaviour is expected.
Please have a look to
https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html
. It explains how scoring works in ElasticSearch. To makes it short it relies
on :
- Field length : the longer, the worst.
- Term frequancy : the more a searched word appears, the better
- Inversed document frequency : the more a searched word apears in a document,
the worst.
Then one other operation you should know in ES is "tokenization". Before
indexing, elastic search rewrite your query.
Here Bodyyyyyy become merged into body and will match both body and bodyyyyyyy
as it is the same content in the index. Again, tokenizers consider '-' as a
separator, and Openpaas-linagora will bee tokenized as "Linagora" and
"Openpaas".
Finally on complex queries, ES behaves as a sum of score of individual words.
It's expected and it is the way it works. It still add some proximity
information.
Such feature are expected from a decent search. For instance I can not remember
which company edits OpenPaas (and query "OpenPaas-Other" ) and still get
results.
The problem is we swallow ES relevance scoring, wich is not at all used for
sorting messages. Thus, as "OpenPaas-Other" is a partial match (with relevance
0.1) and "OpenPaas-Linagora" a good match (relevance 1.5), if I do not take
score into accont, I might end up with "OpenPass-Other" being reported before
"OpenPaas-Linagora"
Finally, I would not open a MAILBOX ticket when mentionning JMAP related stuff.
If JMAP wants to query it differently, we should handle that from the JMAP
layer.
> String FilterCondition in getMessageList request should work correctly when
> including white-spaces or hyphens
> -------------------------------------------------------------------------------------------------------------
>
> Key: MAILBOX-280
> URL: https://issues.apache.org/jira/browse/MAILBOX-280
> Project: James Mailbox
> Issue Type: Bug
> Reporter: Laura Royet
>
> When making a getMessageList request with a String FilterCondition containing
> a white-space or an hyphen, the integration test becomes green even with
> wrong ending word or one wrong word.
> Examples:
> When the email contains "Openpaas-Linagora"
> a filtering on "Openpaassssss-Linagora"in the property "text" of
> FilterCondition is matching. It is the same for the following String :
> "Openpaas-Linagoraaaaaaa", "bla OpenPaas", "OpenPaas anticonstitutionn".
> When the email contains "Test body"
> a filtering on "Testyyy body"in the property "body" of FilterCondition is
> matching. It is the same for: "Testy bodyyyyy", "Test gakayanakj", "halabdp
> body".
> Can be reproduced respectivly in tests
> "messageWithComplicatedAttachmentShouldHaveItsEmailBodyIndexed()" in
> "SetMessagesMethodTest.java" in "package
> org.apache.james.jmap.methods.integration"
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]