Thats one approach we can think about. Thinking further with Lucene design of immutable files things become simpler (ignoring the reindex case). In normal usage Lucene never reuses the file name and never modifies any existing file. So we would not have to worry about reading older revisions. We only need to keep track of deleted file and blob's referred by them.
So once a file node is marked as deleted we can possibly have a diff performed (we already do it to detect when index is changed) and collect blobId from deleted file nodes from previous state. Those can be safely deleted *after* some time (allowing other cluster nodes to pickup). Chetan Mehrotra On Tue, Mar 10, 2015 at 4:53 PM, Michael Marth <[email protected]> wrote: > Could the Lucene indexer explicitly track these files (e.g. as a property in > the index definition)? And also take care of removing them? (the latter part > is assuming that the same index file is not identical across various > definitions) > >> On 10 Mar 2015, at 12:18, Chetan Mehrotra <[email protected]> wrote: >> >> On Tue, Mar 10, 2015 at 4:12 PM, Michael Dürig <[email protected]> wrote: >>> The problem is that you don't even have a list of all previous revisions of >>> the root node state. Revisions are created on the fly and kept as needed. >> >> hmm yup. Then we would need to think of some other approach to know >> all the blobId referred to by the Lucene Index files >> >> >> Chetan Mehrotra >
