[ 
https://issues.apache.org/jira/browse/OAK-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15486564#comment-15486564
 ] 

Thomas Mueller commented on OAK-4788:
-------------------------------------

> I think parser shouldn't play with ordering

I agree. Even if it's irrelevant for Lucene (not sure), it is unexpected, so 
should not be done.

> I think making unique or not shouldn't be parsers's concern at all.

It makes sense when using the "aggregate at query time" to improve search speed 
and reduce memory. But I'm not sure if specifying the same word multiple times 
is ignored by Apache Lucene or not, need to verify this.

> Fulltext parser sorts and unique-s parsed terms
> -----------------------------------------------
>
>                 Key: OAK-4788
>                 URL: https://issues.apache.org/jira/browse/OAK-4788
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: query
>            Reporter: Vikas Saurabh
>            Priority: Minor
>
> Pasting a bit of discussion from OAK-4705:
> {quote}
> bq. whether it's a good idea to sort entries ("hello - world" becomes "- 
> hello world") and make them unique ("test test" becomes "test").
> I think parser shouldn't play with ordering .. but I can see the rational 
> that it allows consumer of parsed output to potentially have forward seeks in 
> their dictionaries. Otoh, I think making unique or not shouldn't be parsers's 
> concern at all.
> I'd open a new issue to follow up on these aspects.
> {quote}
> /cc [~tmueller]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to