[ https://issues.apache.org/jira/browse/OAK-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amit Jain updated OAK-7246: --------------------------- Summary: Improve cleanup of locally copied index files (was: Improve ceanup of locally copied index files) > Improve cleanup of locally copied index files > --------------------------------------------- > > Key: OAK-7246 > URL: https://issues.apache.org/jira/browse/OAK-7246 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene > Reporter: Vikas Saurabh > Assignee: Vikas Saurabh > Priority: Major > > This task is to re-think how should we do clean up of locally copied index > files which are no longer in use. > Current approach: > # index writers, while creating index files, keep list of > currently-being-written files > ## this list is cleared when a new index writer comes into play > # index tracker opens new index (at new revision) via observation > ## while being opened, we also track current dir listing of the local index > files > # during opening new index, the tracker closes the old revision of index > reader > ## during this close, local files noted above during open are purged if ( > they don't show up in remote view of the index && they aren't part of > currently being written list by index writer) > This approach, at least in following timeline, would incur extra copying (and > as a side-effect also open some index files directly off of remote input > stream during CoWs): > # CoW1 creates [a, b] > # CoW2 starts and creates [c, d], removes [a, b] from remote > # CoR1 opens an index due to CoW1 > ## local-list-CoR1 = [a, b, c, d], remote-index-list=[a, b] > # CoW2 finishes > # CoW3 creates [e, f], removes [a,b] from remote > ## CoW-currently-being-written-list=[e,f] > # CoR2 opens due to CoW2 > ## local-list-CoR2=[a,b,c,d,e,f], remote-index-list=[c,d] > # CoR1 closes > ## deletes [c,d] as they aren't in its list of index files ([a,b]) AND aren't > part of shared list ([e,f]) > Disclaimer: the timeline might be off a bit (haven't written a test yet... > but the basic point is that CoR could be working with a index file set and > the new files might have come in twice after CoR - thus shared list doesn't > have complete information of new files written in. > [~chetanm], can you please check the timeline above - I'd try to work on a > test case in the mean time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)