Hi Marcel, > in my experience .cfs files are written once and never modified
I have checked in a testcase with [1] and if you run that you would see following output which indicate that same file is getting updated. ---- ================ _0.cfs - 621 _0.cfe - 194 segments.gen - 20 segments_1 - 81 _0.si - 266 ================ _0.cfs - 789 _0.cfe - 194 segments.gen - 20 segments_1 - 81 _0.si - 266 ================ _0.cfs - 952 _0.cfe - 194 segments.gen - 20 segments_1 - 81 _0.si - 266 ================ _0.cfs - 789 _0.cfe - 194 segments.gen - 20 segments_1 - 81 _0.si - 266 ================ _0.cfs - 955 _0.cfe - 194 segments.gen - 20 segments_1 - 81 _0.si - 266 --------- Chetan Mehrotra [1] http://svn.apache.org/r1633123 On Mon, Oct 20, 2014 at 5:34 PM, Thomas Mueller <[email protected]> wrote: > Hi, > > This blog post is interesting: they are using a physical switch (similar > to a christmas light timer) to test a Lucene index doesn't get corrupt on > power failure. It would be nice if we can do something similar with the > Segment storage at some point. > > Regards, > Thomas > > > > On 20/10/14 13:36, "Marcel Reutegger" <[email protected]> wrote: > >>Hi, >> >>this is very strange. in my experience .cfs files are written once >>and never modified. this write-once pattern is actually used for >>almost all files, except the segments.gen file you mentioned. E.g. >>see [0] by Mike McCandless when he talks about LUCENE-5574. >> >>is it possible the entire lucene index is replaced by oak? >> >>regards >> marcel >> >>[0] >>http://blog.mikemccandless.com/2014/04/testing-lucenes-index-durability-af >>t >>er.html >> >>On 20/10/14 11:59, "Chetan Mehrotra" <[email protected]> wrote: >> >>>While working on copy on read directory support (OAK-1724) and was >>>checking how Lucene manages the index files. Following observation can >>>be made with various test runs >>> >>>A - Small Index use Compound File format >>>------------------ >>> >>>If index contain few entries then it seems it uses the compound file >>>format as directory listing shows only following files (filename - >>>size) >>> >>>_0.cfs - 621 >>>_0.cfe - 194 >>>segments.gen - 20 >>>segments_1 - 81 >>>_0.si - 266 >>> >>>If the index gets updates the _0.cfs file size changes and other remains >>>same >>> >>>B - Large index store index file seprately >>>-------------------- >>> >>>For large index (not sure of threshold) Lucene seems to store the >>>various index file separately and there probably the file do not get >>>modified and only new file get created >>> >>>Question >>>------------- >>>1. Is this switch from cfs format to storing in separate files is >>>automatic and done by Lucene after index reaches certain size. Or this >>>done something specifically in Oak? >>>2. Lucene would not modify existing file in a directory unless >>> a. In compound storage cfs file would get modified. There also >>>modification would be append only? >>> b. segment.gen - This would get modified everytime >>> c. If separate files are used then any file would never be modified >>>and only new files would be created >>> >>>Chetan Mehrotra >>>PS: Probably the question is more appropriate for Lucene DL but >>>checking here first to see if something in Oak is different from >>>default >> >
