[jira] [Created] (SOLR-12305) Making buffering tlog bounded for faster recovery

Cao Manh Dat (JIRA) Wed, 02 May 2018 19:01:18 -0700

Cao Manh Dat created SOLR-12305:
-----------------------------------

             Summary: Making buffering tlog bounded for faster recovery
                 Key: SOLR-12305
                 URL: https://issues.apache.org/jira/browse/SOLR-12305
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Cao Manh Dat
            Assignee: Cao Manh Dat

The current recovery process has 2 main problems (pointed out by
[~shalinmangar] ) which make it may never finish.
# The replay updates process is too slow, we do it in a single-thread fashion.
Therefore if the more updates get appended at a faster rate, the replay process
will be never finished
# The buffering tlog is unbounded, we keep adding more entries to buffering
tlog and waiting for them to get replayed. If we have a way to reduce the
number of updates in buffering tlog, even when replay process is slow it will
eventually finish.

I come up with a solution for the second problem which is described on this
link:

[https://docs.google.com/document/d/14DCkYRvYnQmactyWek3nYtUVdpu_CVIA4ZBTfQigjlU/edit?usp=sharing]

In short, the document presents a modification for current recovery process
(section 3: algorithm) and also proof the correctness of the modification
(section 1 and 2). There are some pros in this approach
* Making buffering tlog bounded.
* It will automatically throttle updates from the leader, imagine this case
** We have a shard with a leader and a replica. When leader sends replica an
update.
** If the replica is healthy, the leader will have to wait for the replica to
finish process that updates before return to users. Let's call the total time
for an update is T0
** If the replica is recovering, in the current code, the replica will only
append that update to its buffering tlog (which is much faster than indexing),
so the total time for an update is T1 < T0. Therefore the rate of incoming
updates will be higher in this case.
** In above design, T1 will be subequal to T0.

--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-12305) Making buffering tlog bounded for faster recovery

Reply via email to