[
https://issues.apache.org/jira/browse/OAK-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036333#comment-16036333
]
Chetan Mehrotra commented on OAK-6246:
--------------------------------------
The above approach took ~45 mins to traverse ~ 255M documents compared to ~13hr
for 65M docs using NodeStore based traversal. So definitely using natural order
traversal is order of magnitude faster. So we should move to next stage and
implement indexing using this.
In doing this we should also see impact of OAK-4535 i.e. would read all docs
would actually result in reading all docs or not. If not we would need to
handle that case say by doing another run where we re-read docs which have been
modified since the traversal was started
> Support for out of band indexing with read only access to NodeStore
> -------------------------------------------------------------------
>
> Key: OAK-6246
> URL: https://issues.apache.org/jira/browse/OAK-6246
> Project: Jackrabbit Oak
> Issue Type: New Feature
> Components: run
> Reporter: Chetan Mehrotra
> Assignee: Chetan Mehrotra
> Fix For: 1.8
>
>
> Provide support for out of band indexing where oak-run is connected in read
> only mode with NodeStore and indexes are stored on file system. These are
> then imported back by target system.
> Had a discussion with [~catholicon] and following flow was determined
> # Admin would create provision a checkpoint via CheckpointMBean
> # oak-run index is connected to NodeStore in read only mode and passed with
> #* checkpoint from previous step
> #* list of indexes which need to be reindexed
> # oak-run index logic would then proceed with reindexing. However the created
> index data would be stored locally. This would make use of
> #* DirectoryFactory - OAK-6243
> #* Copy-on-write nodestore approach as being used in OAK-6220
> # Once indexing is completed it would dump all index to an output folder with
> some metadata
> # Then admin can copy this index data and use an MBean on the target setup to
> "import" it back. This import would need to
> #* Pause the current async indexers
> #* Import the external index files
> #* Bring the external indexer upto date to there respective lanes checkpoint
> #* Resume the async indexer
> The benefit of this approach is that
> # We only need to backport the import logic. Rest all can be implemented in
> trunk and need not be backported.
> # Using read-only mode allow oak-run from trunk to be safely connected to any
> of the old versions
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)