Vikas Saurabh created OAK-7246:
----------------------------------

             Summary: Improve ceanup of locally copied index files
                 Key: OAK-7246
                 URL: https://issues.apache.org/jira/browse/OAK-7246
             Project: Jackrabbit Oak
          Issue Type: Improvement
          Components: lucene
            Reporter: Vikas Saurabh
            Assignee: Vikas Saurabh


This task is to re-think how should we do clean up of locally copied index 
files which are no longer in use.

Current approach:
# index writers, while creating index files, keep list of 
currently-being-written files
## this list is cleared when a new index writer comes into play
# index tracker opens new index (at new revision) via observation
## while being opened, we also track current dir listing of the local index 
files
# during opening new index, the tracker closes the old revision of index reader
## during this close, local files noted above during open are purged if ( they 
don't show up in remote view of the index && they aren't part of currently 
being written list by index writer)

This approach, at least in following timeline, would incur extra copying (and 
as a side-effect also open some index files directly off of remote input stream 
during CoWs):
# CoW1 creates [a, b]
# CoW2 starts and creates [c, d], removes [a, b] from remote
# CoR1 opens an index due to CoW1
## local-list-CoR1 = [a, b, c, d], remote-index-list=[a, b]
# CoW2 finishes
# CoW3 creates [e, f], removes [a,b] from remote
## CoW-currently-being-written-list=[e,f]
# CoR2 opens due to CoW2
## local-list-CoR2=[a,b,c,d,e,f], remote-index-list=[c,d]
# CoR1 closes
## deletes [c,d] as they aren't in its list of index files ([a,b]) AND aren't 
part of shared list ([e,f])

Disclaimer: the timeline might be off a bit (haven't written a test yet... but 
the basic point is that CoR could be working with a index file set and the new 
files might have come in twice after CoR - thus shared list doesn't have 
complete information of new files written in.

[~chetanm], can you please check the timeline above - I'd try to work on a test 
case in the mean time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to