[
https://issues.apache.org/jira/browse/OAK-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966772#comment-13966772
]
Jukka Zitting commented on OAK-1723:
------------------------------------
+1 The indexed content always exists in the repository, so having a duplicate
copy in the search index is excessive. We can regenerate the relevant Lucene
doc fields based on the existing content whenever needed.
Furthermore, keeping copies of content in the search index is a security
problem, as it allows a user to circumvent read access controls. For example,
say I know (or have a good guess) that a node exists at a given path but I
can't read it. Then I could generate a bunch of probe nodes based on a
dictionary and use similarity search to get a pretty good idea of what the
secret node looks like. If the similarity constraint was generated based on
reading the relevant content normally from the repository, we wouldn't have to
worry about access controls.
> Text content should not be stored as part of Index data
> -------------------------------------------------------
>
> Key: OAK-1723
> URL: https://issues.apache.org/jira/browse/OAK-1723
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: query
> Reporter: Chetan Mehrotra
> Assignee: Alex Parvulescu
> Fix For: 1.0, 1.1
>
>
> As part of OAK-319 Lucene indexer currently stores the index content as part
> of index. This has an adverse effect on performance as noted in OAK-1702.
> To improve the performance this should be disabled
--
This message was sent by Atlassian JIRA
(v6.2#6252)