[ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146602#comment-16146602
 ] 

Chetan Mehrotra commented on OAK-6597:
--------------------------------------

This indeed is an oversight. Looking at current flow there are few 
inconsistency around stored fields (which is required for excerpt support)

# ":fulltext" fields created by Binary text extraction are always stored 
(BinaryTextExtractor#newBinary)
# ":fulltext" fields created by nodeScopeIndex marked fields are not stored
# ":fulltext" fields created by aggregated fields are also not stored 

One way would be to expose a index config "excerptEnabled" which if enabled 
would enable storage of ":fulltext" field created in any of of the above way. 
It would have following behaviour

# If not set then status remains same. #1 is enabled and #2 and #3 disabled
# If set then if true then all modes are enabled else disabled

This would ensure that config value keeps backward compatibility

[~catholicon] [~teofili] Thoughts?

> rep:excerpt not working for content indexed by aggregation in lucene
> --------------------------------------------------------------------
>
>                 Key: OAK-6597
>                 URL: https://issues.apache.org/jira/browse/OAK-6597
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: lucene
>    Affects Versions: 1.6.1, 1.7.6
>            Reporter: Dirk Rudolph
>             Fix For: 1.8
>
>         Attachments: excerpt-with-aggregation-test.patch
>
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to