While working on copy on read directory support (OAK-1724) and was checking how Lucene manages the index files. Following observation can be made with various test runs
A - Small Index use Compound File format ------------------ If index contain few entries then it seems it uses the compound file format as directory listing shows only following files (filename - size) _0.cfs - 621 _0.cfe - 194 segments.gen - 20 segments_1 - 81 _0.si - 266 If the index gets updates the _0.cfs file size changes and other remains same B - Large index store index file seprately -------------------- For large index (not sure of threshold) Lucene seems to store the various index file separately and there probably the file do not get modified and only new file get created Question ------------- 1. Is this switch from cfs format to storing in separate files is automatic and done by Lucene after index reaches certain size. Or this done something specifically in Oak? 2. Lucene would not modify existing file in a directory unless a. In compound storage cfs file would get modified. There also modification would be append only? b. segment.gen - This would get modified everytime c. If separate files are used then any file would never be modified and only new files would be created Chetan Mehrotra PS: Probably the question is more appropriate for Lucene DL but checking here first to see if something in Oak is different from default
