[ 
https://issues.apache.org/jira/browse/OAK-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966772#comment-13966772
 ] 

Jukka Zitting commented on OAK-1723:
------------------------------------

+1 The indexed content always exists in the repository, so having a duplicate 
copy in the search index is excessive. We can regenerate the relevant Lucene 
doc fields based on the existing content whenever needed.

Furthermore, keeping copies of content in the search index is a security 
problem, as it allows a user to circumvent read access controls. For example, 
say I know (or have a good guess) that a node exists at a given path but I 
can't read it. Then I could generate a bunch of probe nodes based on a 
dictionary and use similarity search to get a pretty good idea of what the 
secret node looks like. If the similarity constraint was generated based on 
reading the relevant content normally from the repository, we wouldn't have to 
worry about access controls.

> Text content should not be stored as part of Index data
> -------------------------------------------------------
>
>                 Key: OAK-1723
>                 URL: https://issues.apache.org/jira/browse/OAK-1723
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: query
>            Reporter: Chetan Mehrotra
>            Assignee: Alex Parvulescu
>             Fix For: 1.0, 1.1
>
>
> As part of OAK-319 Lucene indexer currently stores the index content as part 
> of index. This has an adverse effect on performance as noted in OAK-1702.
> To improve the performance this should be disabled 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to