[ https://issues.apache.org/jira/browse/ATLAS-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Radhika Kundam reassigned ATLAS-4408: ------------------------------------- Assignee: Ashutosh Mestry (was: Radhika Kundam) > Dynamic handling of failure in updating index > --------------------------------------------- > > Key: ATLAS-4408 > URL: https://issues.apache.org/jira/browse/ATLAS-4408 > Project: Atlas > Issue Type: New Feature > Components: atlas-core > Affects Versions: 3.0.0, 2.2.0 > Reporter: Radhika Kundam > Assignee: Ashutosh Mestry > Priority: Major > Labels: indexing > Fix For: 3.0.0, 2.3.0 > > Attachments: IndexRecovery.png, IndexRecovery_FunctionalFlow.png > > > *Index failure resilience:* dynamic handling of failure in updating index > (i.e. HBase commit succeeds but index commit fails). > In case of secondary persistence failure scenario, there will be > inconsistency with indexes for all the transactions failed at Solr. And to > repair that, the existing option is re-indexing all the data which is time > consuming as it involves indexing the entire database. > To recover such inconsistencies we can use the *transaction write-ahead log > option*. By enabling write-ahead log(tx.log-tx), JanusGraph maintains all the > transaction log data which can be used to recover indices in case of > failures. With this approach, it’s extra overhead to maintain the log data > for all transactions but with this approach we can guarantee the system is > more resilient and proactive. So advantages of this approach can nullify the > overhead of maintaining log data. > Design details as below. > # Start new service - IndexRecoveryService at Atlas startup. > ## Continuously monitor for Solr(Index Client) health for every retryTime > millisecs > ### If Solr is healthy and recovery start time is available, > #### Start Transaction Recovery with available recovery start time(which is > noted when Solr became unhealthy) > #### Persist current recovery time as previous which can be used later by > passing as custom recovery time to start index recovery if required. > #### Reset current recovery start time > #### Continue with Solr health checkup. > ### If Solr is unhealthy and no recovery start time is available, > #### Shutdown the existing transaction recovery process. > #### Note down the time which should be the next recovery start time and > persist in graph. > #### Continue with Solr health checkup. > Configuration properties to be used for this feature. > 1.To enable or disable index recovery(By default index recovery will be > enabled on Atlas startup) > *atlas.graph.enable.index.recovery=true* > 2.To configure how frequently SOLR health check should be done > *atlas.graph.index.search.solr.status.retry.interval=<time in ms>* > 3.To start index recovery by custom recovery time as user provided > *atlas.graph.index.search.solr.recovery.start.time=1630086622* > -- This message was sent by Atlassian Jira (v8.3.4#803005)