[jira] [Comment Edited] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

Dirk Rudolph (JIRA) Wed, 30 Aug 2017 01:00:26 -0700

    [ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146835#comment-16146835
 ]


Dirk Rudolph edited comment on OAK-6597 at 8/30/17 7:59 AM:
------------------------------------------------------------

{quote}
which if enabled would enable storage of ":fulltext" field created in any of of 
the above way
{quote}

That would mean that the excerpt is created from a stored field containing all 
indexed properties of all nested nodes right? If so there could be the corner 
case that the excerpt would contain weird text on the boundaries of a single 
property value, no?

Example:

{code}
/content/foo
 + jcr:content 
  - text1 = "My fancy text"
  - text2 = "This isn't so fancy"
{code}

If I'm right that would cause an excerpt like "My fancy <b>text</b> This isn't 
so fancy" or even worse without the space: "My fancy <b>text</b>This isn't so 
fancy". Wouldn't it make sense to store each and every nested property in its 
own analyzed field (full:_jcr_content/text1) or similar?

Do we have any insights what will be the impact on the index size and with that 
the impact on query performance against one index that has that feature 
enabled? 


was (Author: diru):
Do we have any insights what will be the impact on the index size and with that 
the impact on query performance against one index that has that feature 
enabled? 

> rep:excerpt not working for content indexed by aggregation in lucene
> --------------------------------------------------------------------
>
>                 Key: OAK-6597
>                 URL: https://issues.apache.org/jira/browse/OAK-6597
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: lucene
>    Affects Versions: 1.6.1, 1.7.6
>            Reporter: Dirk Rudolph
>             Fix For: 1.8
>
>         Attachments: excerpt-with-aggregation-test.patch
>
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

Reply via email to