[hbase] branch branch-2 updated: HBASE-28114 Add more comments to explain why replication log queue could never be empty for normal replication queue (#5443)

zhangduo Fri, 20 Oct 2023 07:38:00 -0700

This is an automated email from the ASF dual-hosted git repository.

zhangduo pushed a commit to branch branch-2
in repository https://gitbox.apache.org/repos/asf/hbase.git



The following commit(s) were added to refs/heads/branch-2 by this push:
     new 2951e564745 HBASE-28114 Add more comments to explain why replication 
log queue could never be empty for normal replication queue (#5443)
2951e564745 is described below

commit 2951e564745a7deafc6b723077ba85f36b83ab46
Author: Duo Zhang <zhang...@apache.org>
AuthorDate: Fri Oct 20 22:22:16 2023 +0800

    HBASE-28114 Add more comments to explain why replication log queue could 
never be empty for normal replication queue (#5443)
    
    Also add a retry logic to make the code more robust
    
    Signed-off-by: Xiaolin Ha <haxiao...@apache.org>
    (cherry picked from commit 4429de48bace58f7581a3ad568c19531a1697071)
---
 .../replication/regionserver/WALEntryStream.java   | 34 ++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java
index 22bf05b3741..8953d18a271 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java
@@ -340,11 +340,35 @@ class WALEntryStream implements Closeable {
     boolean beingWritten = pair.getSecond();
     LOG.trace("Reading WAL {}; result={}, currently open for write={}", 
this.currentPath, state,
       beingWritten);
+    // The below implementation needs to make sure that when beingWritten == 
true, we should not
+    // dequeue the current WAL file in logQueue.
     switch (state) {
       case NORMAL:
         // everything is fine, just return
         return HasNext.YES;
       case EOF_WITH_TRAILER:
+        // in readNextEntryAndRecordReaderPosition, we will acquire 
rollWriteLock, and we can only
+        // schedule a close writer task, in which we will write trailer, under 
the rollWriteLock, so
+        // typically if beingWritten == true, we should not reach here, as we 
need to reopen the
+        // reader after writing the trailer. The only possible way to reach 
here while beingWritten
+        // == true is due to the inflightWALClosures logic in AbstractFSWAL, 
as if the writer is
+        // still in this map, we will consider it as beingWritten, but 
actually, here we could make
+        // sure that the new WAL file has already been enqueued into the 
logQueue, so here dequeuing
+        // the current log file is safe.
+        if (beingWritten && logQueue.getQueue(walGroupId).size() <= 1) {
+          // As explained above, if we implement everything correctly, we 
should not arrive here.
+          // But anyway, even if we reach here due to some code changes in the 
future, reading
+          // the file again can make sure that we will not accidentally 
consider the queue as
+          // finished, and since there is a trailer, we will soon consider the 
file as finished
+          // and move on.
+          LOG.warn(
+            "We have reached the trailer while reading the file '{}' which is 
currently"
+              + " beingWritten, but it is the last file in log queue {}. This 
should not happen"
+              + " typically, try to read again so we will not miss anything",
+            currentPath, walGroupId);
+          return HasNext.RETRY;
+        }
+        assert !beingWritten || logQueue.getQueue(walGroupId).size() > 1;
         // we have reached the trailer, which means this WAL file has been 
closed cleanly and we
         // have finished reading it successfully, just move to the next WAL 
file and let the upper
         // layer start reading the next WAL file
@@ -442,6 +466,16 @@ class WALEntryStream implements Closeable {
    * Returns whether the file is opened for writing.
    */
   private Pair<WALTailingReader.State, Boolean> 
readNextEntryAndRecordReaderPosition() {
+    // we must call this before actually reading from the reader, as this 
method will acquire the
+    // rollWriteLock. This is very important, as we will enqueue the new WAL 
file in postLogRoll,
+    // and before this happens, we could have already finished closing the 
previous WAL file. If we
+    // do not acquire the rollWriteLock and return whether the current file is 
being written to, we
+    // may finish reading the previous WAL file and start to read the next 
one, before it is
+    // enqueued into the logQueue, thus lead to an empty logQueue and make the 
shipper think the
+    // queue is already ended and quit. See HBASE-28114 and related issues for 
more details.
+    // in the future, if we want to optimize the logic here, for example, do 
not call this method
+    // every time, or do not acquire rollWriteLock in the implementation of 
this method, we need to
+    // carefully review the optimized implementation
     OptionalLong fileLength = 
walFileLengthProvider.getLogFileSizeIfBeingWritten(currentPath);
     WALTailingReader.Result readResult = reader.next(fileLength.orElse(-1));
     long readerPos = readResult.getEntryEndPos();

[hbase] branch branch-2 updated: HBASE-28114 Add more comments to explain why replication log queue could never be empty for normal replication queue (#5443)

Reply via email to