[
https://issues.apache.org/jira/browse/OAK-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chetan Mehrotra updated OAK-3629:
---------------------------------
Fix Version/s: (was: 1.3.15)
1.6
> Index corruption seen with CopyOnRead when index defnition is recreated
> -----------------------------------------------------------------------
>
> Key: OAK-3629
> URL: https://issues.apache.org/jira/browse/OAK-3629
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: lucene
> Reporter: Chetan Mehrotra
> Assignee: Chetan Mehrotra
> Priority: Blocker
> Fix For: 1.6
>
>
> CopyOnRead logic relies on {{reindexCount}} to determine the name of
> directory in which index files would be copied. In normal flow if the index
> is reindexed then this count would get increased and newer index files would
> get copied to a new directory.
> However if the index definition node gets recreated due to some deployment
> process then this count gets reset to 0. Due to which newly created index
> files from reindexing would start getting copied to already existing
> directory and that can lead to corruption.
> So what happened here was
> # System started with index definition I1 and indexing got complete with
> index files saved under index/hash(indexpath)/1 (where 1 is current reindex
> count)
> # A new index definition package was deployed which reset the index count.
> Now reindex happened again and the CopyOnRead logic per current design reused
> the existing index directory. And it so happens that Lucene create file with
> same name and same size but different content. This trips the CopyOnRead
> defense of length based index corruption check and thus cause new lucene
> index to corrupt
> *Note that here corruption is transient i.e. persisted index is not
> corrupted*. Just that locally copied index gets corrupted. Cleaning up the
> index directory would fix the issue and that can be used as a workaround.
> *Fix*
> After discussing with [~tmueller] following approach can be used.
> Instead of relying on reindex count we can maintain a hidden randomly
> generated uuid and store it in the index config. This would be used to derive
> the name of directory on filesystem. If the index definition gets reset then
> the uuid can be regenerated.
> *Workaround*
> Clean the directory used by CopyOnRead which is <repo home>/index before
> restart
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)