While working on copy on read directory support (OAK-1724) and was
checking how Lucene manages the index files. Following observation can
be made with various test runs

A - Small Index use Compound File format
------------------

If index contain few entries then it seems it uses the compound file
format as directory listing shows only following files (filename -
size)

_0.cfs - 621
_0.cfe - 194
segments.gen - 20
segments_1 - 81
_0.si - 266

If the index gets updates the _0.cfs file size changes and other remains same

B - Large index store index file seprately
--------------------

For large index (not sure of threshold) Lucene seems to store the
various index file separately and there probably the file do not get
modified and only new file get created

Question
-------------
1. Is this switch from cfs format to storing in separate files is
automatic and done by Lucene after index reaches certain size. Or this
done something specifically in Oak?
2. Lucene would not modify existing file in a directory unless
  a. In compound storage cfs file would get modified. There also
modification would be append only?
  b. segment.gen - This would get modified everytime
  c. If separate files are used then any file would never be modified
and only new files would be created

Chetan Mehrotra
PS: Probably the question is more appropriate for Lucene DL but
checking here first to see if something in Oak is different from
default

Reply via email to