> The index indeed gets rebuilt. In IndexUpdate.collectIndexEditors() the
provider

Did not realized that. So it can be safely assumed that file created
in Lucene index

1. Never get updated
2. Names never get reused

That would simplify the logic for CopyOnRead a lot. Now only thing to
take care of is the reindex case and clear the lucene replica on the
local. That was discussed on another thread [1]. Would try to
implement what Thomas suggested there

Thanks Marcel !

Chetan Mehrotra
[1] http://markmail.org/thread/dzvy7zumcdkegrgz


On Tue, Oct 21, 2014 at 12:20 PM, Alex Parvulescu
<[email protected]> wrote:
>> The index indeed gets rebuilt. In IndexUpdate.collectIndexEditors() the
> provider does not return any editors and the following code is executed
>
> OAK-2203
>
> On Tue, Oct 21, 2014 at 8:37 AM, Marcel Reutegger <[email protected]>
> wrote:
>
>> Hi,
>>
>> this is the output when I run it on my machine within IntelliJ:
>>
>> 17:13:10.035 [main] INFO  o.a.j.oak.plugins.index.IndexUpdate - Reindexing
>> will be performed for following indexes: [/oak:index/lucene]
>> 17:13:10.172 [main] DEBUG o.a.j.o.p.i.lucene.LuceneIndexEditor - Indexed 1
>> nodes, done.
>> ================
>> _0.cfs - 621
>> _0.cfe - 194
>> segments.gen - 20
>> segments_1 - 81
>> _0.si - 252
>> 17:13:10.187 [main] INFO  o.a.j.oak.plugins.index.IndexUpdate - Reindexing
>> will be performed for following indexes: [/oak:index/lucene]
>> 17:13:10.200 [main] DEBUG o.a.j.o.p.i.lucene.LuceneIndexEditor - Indexed 2
>> nodes, done.
>> ================
>> _0.cfs - 789
>> _0.cfe - 194
>> segments.gen - 20
>> segments_1 - 81
>> _0.si - 252
>> 17:13:10.204 [main] INFO  o.a.j.oak.plugins.index.IndexUpdate - Reindexing
>> will be performed for following indexes: [/oak:index/lucene]
>> 17:13:10.220 [main] DEBUG o.a.j.o.p.i.lucene.LuceneIndexEditor - Indexed 3
>> nodes, done.
>> ================
>> _0.cfs - 952
>> _0.cfe - 194
>> segments.gen - 20
>> segments_1 - 81
>> _0.si - 252
>> 17:13:10.223 [main] INFO  o.a.j.oak.plugins.index.IndexUpdate - Reindexing
>> will be performed for following indexes: [/oak:index/lucene]
>> 17:13:10.238 [main] DEBUG o.a.j.o.p.i.lucene.LuceneIndexEditor - Indexed 2
>> nodes, done.
>> ================
>> _0.cfs - 789
>> _0.cfe - 194
>> segments.gen - 20
>> segments_1 - 81
>> _0.si - 252
>> 17:13:10.241 [main] INFO  o.a.j.oak.plugins.index.IndexUpdate - Reindexing
>> will be performed for following indexes: [/oak:index/lucene]
>> 17:13:10.256 [main] DEBUG o.a.j.o.p.i.lucene.LuceneIndexEditor - Indexed 3
>> nodes, done.
>> ================
>> _0.cfs - 955
>> _0.cfe - 194
>> segments.gen - 20
>> segments_1 - 81
>> _0.si - 252
>>
>>
>>
>>
>>
>> The index indeed gets rebuilt. In IndexUpdate.collectIndexEditors() the
>> provider
>> does not return any editors and the following code is executed:
>>
>> Editor editor = provider.getIndexEditor(type, definition, root,
>> updateCallback);
>> if (editor == null) {
>>     // trigger reindexing when an indexer becomes available
>>     definition.setProperty(REINDEX_PROPERTY_NAME, true);
>> } else ...
>>
>>
>> We need to detect a re-index and clear the lucene replica on the local
>> disk.
>> As we can see, lucene will start with generation zero again and increment
>> it
>> with every modification. This will eventually lead to a collision with the
>> replica on the local disk. In this extreme case, it even happens with every
>> modification ;)
>>
>> Regards
>>  Marcel
>>
>> On 20/10/14 14:24, "Chetan Mehrotra" <[email protected]> wrote:
>>
>> >Hi Marcel,
>> >
>> >> in my experience .cfs files are written once
>> >and never modified
>> >
>> >I have checked in a testcase with [1] and if you run that you would
>> >see following output which indicate that same file is getting updated.
>> >
>> >----
>> >================
>> >_0.cfs - 621
>> >_0.cfe - 194
>> >segments.gen - 20
>> >segments_1 - 81
>> >_0.si - 266
>> >================
>> >_0.cfs - 789
>> >_0.cfe - 194
>> >segments.gen - 20
>> >segments_1 - 81
>> >_0.si - 266
>> >================
>> >_0.cfs - 952
>> >_0.cfe - 194
>> >segments.gen - 20
>> >segments_1 - 81
>> >_0.si - 266
>> >================
>> >_0.cfs - 789
>> >_0.cfe - 194
>> >segments.gen - 20
>> >segments_1 - 81
>> >_0.si - 266
>> >================
>> >_0.cfs - 955
>> >_0.cfe - 194
>> >segments.gen - 20
>> >segments_1 - 81
>> >_0.si - 266
>> >---------
>> >
>> >Chetan Mehrotra
>> >[1] http://svn.apache.org/r1633123
>> >
>> >
>> >On Mon, Oct 20, 2014 at 5:34 PM, Thomas Mueller <[email protected]>
>> wrote:
>> >> Hi,
>> >>
>> >> This blog post is interesting: they are using a physical switch (similar
>> >> to a christmas light timer) to test a Lucene index doesn't get corrupt
>> >>on
>> >> power failure. It would be nice if we can do something similar with the
>> >> Segment storage at some point.
>> >>
>> >> Regards,
>> >> Thomas
>> >>
>> >>
>> >>
>> >> On 20/10/14 13:36, "Marcel Reutegger" <[email protected]> wrote:
>> >>
>> >>>Hi,
>> >>>
>> >>>this is very strange. in my experience .cfs files are written once
>> >>>and never modified. this write-once pattern is actually used for
>> >>>almost all files, except the segments.gen file you mentioned. E.g.
>> >>>see [0] by Mike McCandless when he talks about LUCENE-5574.
>> >>>
>> >>>is it possible the entire lucene index is replaced by oak?
>> >>>
>> >>>regards
>> >>> marcel
>> >>>
>> >>>[0]
>> >>>
>> http://blog.mikemccandless.com/2014/04/testing-lucenes-index-durability-
>> >>>af
>> >>>t
>> >>>er.html
>> >>>
>> >>>On 20/10/14 11:59, "Chetan Mehrotra" <[email protected]> wrote:
>> >>>
>> >>>>While working on copy on read directory support (OAK-1724) and was
>> >>>>checking how Lucene manages the index files. Following observation can
>> >>>>be made with various test runs
>> >>>>
>> >>>>A - Small Index use Compound File format
>> >>>>------------------
>> >>>>
>> >>>>If index contain few entries then it seems it uses the compound file
>> >>>>format as directory listing shows only following files (filename -
>> >>>>size)
>> >>>>
>> >>>>_0.cfs - 621
>> >>>>_0.cfe - 194
>> >>>>segments.gen - 20
>> >>>>segments_1 - 81
>> >>>>_0.si - 266
>> >>>>
>> >>>>If the index gets updates the _0.cfs file size changes and other
>> >>>>remains
>> >>>>same
>> >>>>
>> >>>>B - Large index store index file seprately
>> >>>>--------------------
>> >>>>
>> >>>>For large index (not sure of threshold) Lucene seems to store the
>> >>>>various index file separately and there probably the file do not get
>> >>>>modified and only new file get created
>> >>>>
>> >>>>Question
>> >>>>-------------
>> >>>>1. Is this switch from cfs format to storing in separate files is
>> >>>>automatic and done by Lucene after index reaches certain size. Or this
>> >>>>done something specifically in Oak?
>> >>>>2. Lucene would not modify existing file in a directory unless
>> >>>>  a. In compound storage cfs file would get modified. There also
>> >>>>modification would be append only?
>> >>>>  b. segment.gen - This would get modified everytime
>> >>>>  c. If separate files are used then any file would never be modified
>> >>>>and only new files would be created
>> >>>>
>> >>>>Chetan Mehrotra
>> >>>>PS: Probably the question is more appropriate for Lucene DL but
>> >>>>checking here first to see if something in Oak is different from
>> >>>>default
>> >>>
>> >>
>>
>>

Reply via email to