[
https://issues.apache.org/jira/browse/OAK-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034442#comment-16034442
]
Thomas Mueller commented on OAK-6246:
-------------------------------------
Indexing currently traverses nodes in the order of path (I assume depth-first
traversal). It might be much faster if we traverse in the order that nodes are
stored in the low-level storage. This would be to replace a lot of random disk
I/O with sequential I/O.
[~ianeboston] mentioned that we could easily test this on MongoDB, using
db.nodes.find(), on command line piped to /dev/null on the MongoDB primary and
remotely (I hope I got right, Ian?). It would be best to test this on both SSD
and HDD, and compare to "regular" node traversal.
The disadvantage is that with low-level traversal, we can't easily exclude
subtrees (including hidden nodes). But possibly the benefit is higher than the
drawback.
> Support for out of band indexing with read only access to NodeStore
> -------------------------------------------------------------------
>
> Key: OAK-6246
> URL: https://issues.apache.org/jira/browse/OAK-6246
> Project: Jackrabbit Oak
> Issue Type: New Feature
> Components: run
> Reporter: Chetan Mehrotra
> Assignee: Chetan Mehrotra
> Fix For: 1.8
>
>
> Provide support for out of band indexing where oak-run is connected in read
> only mode with NodeStore and indexes are stored on file system. These are
> then imported back by target system.
> Had a discussion with [~catholicon] and following flow was determined
> # Admin would create provision a checkpoint via CheckpointMBean
> # oak-run index is connected to NodeStore in read only mode and passed with
> #* checkpoint from previous step
> #* list of indexes which need to be reindexed
> # oak-run index logic would then proceed with reindexing. However the created
> index data would be stored locally. This would make use of
> #* DirectoryFactory - OAK-6243
> #* Copy-on-write nodestore approach as being used in OAK-6220
> # Once indexing is completed it would dump all index to an output folder with
> some metadata
> # Then admin can copy this index data and use an MBean on the target setup to
> "import" it back. This import would need to
> #* Pause the current async indexers
> #* Import the external index files
> #* Bring the external indexer upto date to there respective lanes checkpoint
> #* Resume the async indexer
> The benefit of this approach is that
> # We only need to backport the import logic. Rest all can be implemented in
> trunk and need not be backported.
> # Using read-only mode allow oak-run from trunk to be safely connected to any
> of the old versions
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)