[
https://issues.apache.org/jira/browse/OAK-6535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179017#comment-16179017
]
Chetan Mehrotra edited comment on OAK-6535 at 9/25/17 1:24 PM:
---------------------------------------------------------------
This feature is now ready for review
* On github - See
[here|https://github.com/chetanmeh/jackrabbit-oak/compare/trunk...chetanmeh:OAK-6535]
* As single patch - See [here|^OAK-6535-v1.diff]
* See
[wiki|https://wiki.apache.org/jackrabbit/Synchronous%20Lucene%20Property%20Indexes]
for more background
h2. Implementation Details
*Indexing*
{{LuceneIndexEditor}} now supports a {{PropertyUpdateCallback}} which is
invoked for each indexed property change. For this feature we provide a
{{PropertyIndexUpdateCallback}} which performs the property index update as per
property index type.
For non unique sync index it uses {{ContentMirrorStoreStrategy}} and for unique
it uses {{UniqueIndexStoreStrategy}}. See wiki for storage format
For non unique indexes it disables default pruning
For unique index each index entry also stores a timestamp (as epoch time) in
{{jcr:created}}. Notes its not of type Calendar
*Query*
On query side {{IndexPlanner}} checks if the definition support sync indexes.
If yes then it determine which sync index can be used. For a query only of the
sync indexes can be used. It follows following rule
* If any unique index is found then that is given preference
* If multiple non unique sync indexes are found then first one is used
In case of unique index the entryCount is set to 1 such that this index reports
almost lowest cost.
Post planning the {{LucenePropertyIndex}} would see if planner has identified
any sync index. If yes then it returns a concatenated iterator where iterator
provided by property index (via {{HybridPropertyIndexLookup}}) comes first.
*Cleanup*
This feature configures a {{PropertyIndexCleaner}} job which gets periodically
triggered (default frequency every 10 min) and does following
# First change the head bucket if there is any change in current head bucket
state for non unique sync index. This is merged
# For non unique sync index cleanup old orphan buckets
# For unique index scan the index entries and remove those index entries whose
{{jcr:created}} is older than lastIndexTo time of indexes indexer lane. That is
those entries which have been moved to lucene index are removed. In doing this
it also keeps a threshold which defaults to 1 hr
*Misc Points*
# Supports relative properties
# Supports non root indexes
h2. Benchmark
The benchmark can be run via
{noformat}
java -DhybridIndexEnabled=true -DindexingMode=nrt -DsyncIndexing=true -jar
oak-benchmark*.jar benchmark HybridIndexTest Oak-Segment-Tar-DS
{noformat}
Here
* hybridIndexEnabled=true, syncIndexing=true - Enables this feature i.e. 'foo'
property indexed in hybrid mode
* hybridIndexEnabled=true, syncIndexing=false - Enables just the NRT mode
* hybridIndexEnabled=false, syncIndexing=false - Enables pure property index
mode
{noformat}
# HybridIndexTest C min 10% 50% 90% max
N Searcher Mutator Indexed
Oak-Segment-Tar-DS 1 4 6 7 9 527
7992 5385539 39400 49890 #nrt,oakCodec,sync
Oak-Segment-Tar-DS 1 4 6 7 10 114
7462 6834075 34220 46362 #property
Oak-Segment-Tar-DS 1 4 5 6 8 508
9063 4439786 47797 56844 #nrt,oakCodec
numOfIndexes: 10, refreshDeltaMillis: 1000, asyncInterval: 5, queueSize: 1000 ,
hybridIndexEnabled: true, indexingMode: nrt, useOakCodec: true,
cleanerIntervalInSecs: 10, syncIndexing: true
{noformat}
h2. Pending Stuff
*Open Items*
# Support for nodetype index
# Support for reference index
*Points to discuss*
Apart from current impl design following aspects needs to be discussed
# Frequency of the cleaner job - Currently it is scheduled to run every 10 mins
# Threshold for unique index cleanup - Currently entries would be removed after
1 hr of them making into persisted lucene index
[~tmueller] [~catholicon] [~teofili] Please review the patch. I would keep this
open for this week so that you get time. Plan to merge next week
was (Author: chetanm):
This feature is now ready for review
* On github - See
[here|https://github.com/chetanmeh/jackrabbit-oak/compare/trunk...chetanmeh:OAK-6535]
* As single patch - See [here|^OAK-6535-v1.diff]
* See
[wiki|https://wiki.apache.org/jackrabbit/Synchronous%20Lucene%20Property%20Indexes]
for more background
h3. Implementation Details
*Indexing*
{{LuceneIndexEditor}} now supports a {{PropertyUpdateCallback}} which is
invoked for each indexed property change. For this feature we provide a
{{PropertyIndexUpdateCallback}} which performs the property index update as per
property index type. For non unique sync index it uses
{{ContentMirrorStoreStrategy}} and for unique it uses
{{UniqueIndexStoreStrategy}}. See wiki for storage format
For unique index each index entry also stores a timestamp (as epoch time) in
{{jcr:created}}. Notes its not of type Calendar
*Query*
On query side {{IndexPlanner}} checks if the definition support sync indexes.
If yes then it determine which sync index can be used. For a query only of the
sync indexes can be used. It follows following rule
* If any unique index is found then that is given preference
* If multiple non unique sync indexes are found then first one is used
In case of unique index the entryCount is set to 1 such that this index reports
almost lowest cost.
Post planning the {{LucenePropertyIndex}} would see if planner has identified
any sync index. If yes then it returns a concatenated iterator where iterator
provided by property index (via {{HybridPropertyIndexLookup}}) comes first.
*Cleanup*
This feature configures a {{PropertyIndexCleaner}} job which gets periodically
triggered (default frequency every 10 min) and does following
# First change the head bucket if there is any change in current head bucket
state for non unique sync index. This is merged
# For non unique sync index cleanup old orphan buckets
# For unique index scan the index entries and remove those index entries whose
{{jcr:created}} is older than lastIndexTo time of indexes indexer lane. That is
those entries which have been moved to lucene index are removed. In doing this
it also keeps a threshold which defaults to 1 hr
h3. Benchmark
The benchmark can be run via
{noformat}
java -DhybridIndexEnabled=true -DindexingMode=nrt -DsyncIndexing=true -jar
oak-benchmark*.jar benchmark HybridIndexTest Oak-Segment-Tar-DS
{noformat}
Here
* hybridIndexEnabled=true, syncIndexing=true - Enables this feature i.e. 'foo'
property indexed in hybrid mode
* hybridIndexEnabled=true, syncIndexing=false - Enables just the NRT mode
* hybridIndexEnabled=false, syncIndexing=false - Enables pure property index
mode
{noformat}
# HybridIndexTest C min 10% 50% 90% max
N Searcher Mutator Indexed
Oak-Segment-Tar-DS 1 4 6 7 9 527
7992 5385539 39400 49890 #nrt,oakCodec,sync
Oak-Segment-Tar-DS 1 4 6 7 10 114
7462 6834075 34220 46362 #property
Oak-Segment-Tar-DS 1 4 5 6 8 508
9063 4439786 47797 56844 #nrt,oakCodec
numOfIndexes: 10, refreshDeltaMillis: 1000, asyncInterval: 5, queueSize: 1000 ,
hybridIndexEnabled: true, indexingMode: nrt, useOakCodec: true,
cleanerIntervalInSecs: 10, syncIndexing: true
{noformat}
h3. Pending Stuff
*Open Items*
# Support for nodetype index
# Support for reference index
*Points to discuss*
Apart from current impl design following aspects needs to be discussed
# Frequency of the cleaner job - Currently it is scheduled to run every 10 mins
# Threshold for unique index cleanup - Currently entries would be removed after
1 hr of them making into persisted lucene index
[~tmueller] [~catholicon] [~teofili] Please review the patch. I would keep this
open for this week so that you get time. Plan to merge next week
> Synchronous Lucene Property Indexes
> -----------------------------------
>
> Key: OAK-6535
> URL: https://issues.apache.org/jira/browse/OAK-6535
> Project: Jackrabbit Oak
> Issue Type: New Feature
> Components: lucene, property-index
> Reporter: Chetan Mehrotra
> Assignee: Chetan Mehrotra
> Fix For: 1.8
>
> Attachments: OAK-6535-v1.diff
>
>
> Oak 1.6 added support for Lucene Hybrid Index (OAK-4412). That enables near
> real time (NRT) support for Lucene based indexes. It also had a limited
> support for sync indexes. This feature aims to improve that to next level and
> enable support for sync property indexes.
> More details at
> https://wiki.apache.org/jackrabbit/Synchronous%20Lucene%20Property%20Indexes
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)