Re: Active deletion of 'deleted' Lucene index files from DataStore without relying on full scale Blob GC

Chetan Mehrotra Tue, 10 Mar 2015 04:39:57 -0700

Thats one approach we can think about. Thinking further with Lucene
design of immutable files things become simpler (ignoring the reindex
case). In normal usage Lucene never reuses the file name and never
modifies any existing file. So we would not have to worry about
reading older revisions. We only need to keep track of deleted file
and blob's referred by them.


So once a file node is marked as deleted we can possibly have a diff
performed (we already do it to detect when index is changed) and
collect blobId from deleted file nodes from previous state. Those can
be safely deleted *after* some time (allowing other cluster nodes to
pickup).
Chetan Mehrotra


On Tue, Mar 10, 2015 at 4:53 PM, Michael Marth <[email protected]> wrote:
> Could the Lucene indexer explicitly track these files (e.g. as a property in 
> the index definition)? And also take care of removing them? (the latter part 
> is assuming that the same index file is not identical across various 
> definitions)
>
>> On 10 Mar 2015, at 12:18, Chetan Mehrotra <[email protected]> wrote:
>>
>> On Tue, Mar 10, 2015 at 4:12 PM, Michael Dürig <[email protected]> wrote:
>>> The problem is that you don't even have a list of all previous revisions of
>>> the root node state. Revisions are created on the fly and kept as needed.
>>
>> hmm yup. Then we would need to think of some other approach to know
>> all the blobId referred to by the Lucene Index files
>>
>>
>> Chetan Mehrotra
>

Re: Active deletion of 'deleted' Lucene index files from DataStore without relying on full scale Blob GC

Reply via email to