oak-run-indexing.md

chetanm Mon, 19 Jun 2017 05:01:30 -0700

Author: chetanm
Date: Mon Jun 19 12:01:09 2017
New Revision: 1799184

URL: http://svn.apache.org/viewvc?rev=1799184&view=rev
Log:
OAK-6081 - Indexing tooling via oak-run


Document currently supported operations

Added:
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/oak-run-indexing.md

Added: jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/oak-run-indexing.md
URL: 
http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/oak-run-indexing.md?rev=1799184&view=auto
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/oak-run-indexing.md 
(added)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/oak-run-indexing.md 
Mon Jun 19 12:01:09 2017
@@ -0,0 +1,111 @@
+# Oak Run Indexing
+
+`@since Oak 1.7.0`
+
+**Work in progress. Not to be used on production setups**
+
+With Oak 1.7 we have added some tooling as part of oak-run `index` command. 
Below are details around various
+operations supported by this command.
+
+The `index` command supports connecting to different NodeStores via various 
options which are documented 
+[here](../features/oak-run-nodestore-connection-options.md). Example below 
assume a setup consisting of 
+SegmentNodeStore and FileDataStore. Depending on setup use the appropriate 
connection options.
+ 
+Unless specified all operations connect to the repository in read only mode
+
+## Common Options
+
+All the commands support following common options
+
+1. `--index-paths` - Comma separated list of index paths for which the 
selected operations need to be performed. If
+   not specified then the operation would be performed against all the indexes.
+   
+Also refer to help output via `-h` command for some other options
+
+## Generate Index Info
+
+    java -jar oak-run*.jar index --fds-path=/path/to/datastore  
/path/to/segmentstore/ --index-info 
+
+Generates a report consisting of various stats related to indexes present in 
the given repository. The generated
+report is stored by default in `index-info.txt`
+
+Supported for all index types
+
+## Dump Index Definitions
+
+    java -jar oak-run*.jar index --fds-path=/path/to/datastore  
/path/to/segmentstore/ --index-definitions
+     
+`--index-definitions` operation dumps the index definition in json format to a 
file `index-definitions.json`. The json
+file contains index definitions keyed against the index paths
+
+Supported for all index types
+
+## Dump Index Data
+
+    java -jar oak-run*.jar index --fds-path=/path/to/datastore  
/path/to/segmentstore/ --index-dump
+     
+`--index-dump` operation dumps the index content in output directory. The 
output directory would contain one folder for 
+each index. Each folder would have a property file `index-details.txt` which 
contains `indexPath`
+
+Supported for only Lucene indexes.
+
+## Index Consistency Check
+
+    java -jar oak-run*.jar index --fds-path=/path/to/datastore  
/path/to/segmentstore/ --index-consistency-check
+    
+`--index-consistency-check` operation performs index consistency check against 
various indexes. It supports 2 level
+
+* Level 1 - Specified as `--index-consistency-check=1`. Performs a basic check 
to determine if all blobs referred in index
+  are valid
+* Level 2 - Specified as `--index-consistency-check=2`. Performs a more 
through check to determine if all index files
+  are valid and no corruption has happened. This check is slower
+
+Supported for only Lucene indexes.
+
+## Reindex
+
+The reindex operation supports 2 modes of index
+
+* Online Indexing - Here oak-run would connect to repository in `--read-write` 
mode
+* Out-of-band indexing - Here oak-run would connect to repository in read only 
mode. It would require certain manual steps
+
+Supported for only Lucene indexes.
+
+### out-of-band indexing
+
+Out of band indexing has following phases
+
+1. Get checkpoint issued 
+2. Perform indexing with read only connection to NodeStore upto checkpoint 
state
+3. Import the generated indexes and complete the increment indexing from 
checkpoint state to current head
+
+
+#### Step 1 - Text PreExtraction
+
+If the index being reindexed involves fulltext index and the repository has 
binary content then its recommended
+that first  [text pre-extraction](pre-extract-text.md) is performed. This 
ensures that costly operation around text
+extraction is done prior to actual indexing so that actual indexing does not 
do text extraction in critical path
+
+#### Step 2 - Create Checkpoint
+
+Go to `CheckpointMBean` and create a checkpoint with lifetime of 1 month. 
<<TBD>>
+
+#### Step 3 - Perform Reindex
+
+In this step we perform the actual indexing via oak-run where it connects to 
repository in read only mode. 
+    
+     java -jar oak-run*.jar index --fds-path=/path/to/datastore  
/path/to/segmentstore/ --reindex --index-paths=/oak:index/indexName
+     
+Here following options can be used
+
+* `--pre-extracted-text-dir` - Directory path containing pre extracted text 
generated via step #1
+* `--index-paths` - This command requires an explicit set of index paths which 
need to be indexed
+* `--checkpoint` - The checkpoint up to which the index is updated, when 
indexing in read only mode. For
+  testing purpose, it can be set to 'head' to indicate that the head state 
should be used.
+
+
+
+
+     
+     
+     
\ No newline at end of file

svn commit: r1799184 - /jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/oak-run-indexing.md

Reply via email to