[jira] [Comment Edited] (OAK-4566) Multiplexing store support in Lucene Indexes

Chetan Mehrotra (JIRA) Mon, 01 Aug 2016 22:59:08 -0700

    [ 
https://issues.apache.org/jira/browse/OAK-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401983#comment-15401983
 ]


Chetan Mehrotra edited comment on OAK-4566 at 8/2/16 5:57 AM:
--------------------------------------------------------------

Supporting multiple IndexReader on query side involves 2 things
* Creating individual iterators for LuceneResultRow for each reader and 
combining them
* Handle sorting

The sorting aspects makes thing tricky as QE would not be doing sorting here we 
need ensure that iterators are merge sorted with comparison done on 
LuceneResultRow level. For that there are 2 options
# O1 - Do comparison based on reading the value from the PropertyState. The 
query also has associated NodeState which can be used to read the value of the 
ordered property and comparison done based on that. Note that root NodeState 
bound to the query would be more recent compared to NodeState at which index 
was populated/updated. May be node itself might not exist. In such a case we 
might need to rely on NodeState at which index update was detected. 
# O2 - Make use of Doc values which are stored in Lucene index and then perform 
comparison based on the stored value. This would involved accessing the doc 
value of specific property as iterator is traversed

Had discussion with [~teofili] - Both approach are feasible and would need 
performance benchmark to confirm the result. 

Note that actual sorting is still taken care by Lucene. Its just the part of 
merging two iterators that requires comparison to be performed

*Update* - With MultiReader feature suggested by Tommaso we do not need to 
handled sorting on Oak side as Lucene would take care of that even in multiple 
reader scenario. So above suggestions can be ignored

/cc  [~tmueller] [~alex.parvulescu] [~catholicon]


was (Author: chetanm):
Supporting multiple IndexReader on query side involves 2 things
* Creating individual iterators for LuceneResultRow for each reader and 
combining them
* Handle sorting

The sorting aspects makes thing tricky as QE would not be doing sorting here we 
need ensure that iterators are merge sorted with comparison done on 
LuceneResultRow level. For that there are 2 options
# O1 - Do comparison based on reading the value from the PropertyState. The 
query also has associated NodeState which can be used to read the value of the 
ordered property and comparison done based on that. Note that root NodeState 
bound to the query would be more recent compared to NodeState at which index 
was populated/updated. May be node itself might not exist. In such a case we 
might need to rely on NodeState at which index update was detected. 
# O2 - Make use of Doc values which are stored in Lucene index and then perform 
comparison based on the stored value. This would involved accessing the doc 
value of specific property as iterator is traversed

Had discussion with [~teofili] - Both approach are feasible and would need 
performance benchmark to confirm the result. 

Note that actual sorting is still taken care by Lucene. Its just the part of 
merging two iterators that requires comparison to be performed

/cc  [~tmueller] [~alex.parvulescu] [~catholicon]

> Multiplexing store support in Lucene Indexes
> --------------------------------------------
>
>                 Key: OAK-4566
>                 URL: https://issues.apache.org/jira/browse/OAK-4566
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.6
>
>
> Similar to OAK-3403 we need support multiplexing store in Lucene indexes. 
> This can be done by having multiple directories under given index definition. 
> For e.g. currently the Lucene indexes for an index /oak:index/assetIndex are 
> stored in node /oak:index/assetIndex/:dir. For supporting multiple indexes 
> which get stored in different stores we can have structure like
> {noformat}
> /oak:index/assetIndex
>      + :oak:mount1-dir
>      + :dir
> {noformat}
> In above structure index content for paths which are part of mount1 would be 
> store in Lucene files stores under {{:oak:mount1-dir}} while the rest go in 
> default location {{:dir}
> # *Writing* - At the time of indexing the {{LuceneIndexEditor}} should pick 
> up correct writer i.e. one which is mapped to right directory node in 
> repository
> # *Reading* - For reading we would have one {{IndexSearcher}} per directory 
> node and then query would be executed against both and a joined cursor would 
> be made



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (OAK-4566) Multiplexing store support in Lucene Indexes

Reply via email to