Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-04-15 Thread via GitHub


ankitsol commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-4251464552

   Hi @Apache9 @wchevreuil, just a gentle ping on this PR when you get a 
chance. I’ve addressed the review comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-04-15 Thread via GitHub


ankitsol commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r3085931097


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##


Review Comment:
   Yes, we intentionally do not restart here—this path can be reached during 
normal lifecycle transitions (e.g., config updates), while restarts are only 
triggered explicitly on failure via abortAndRestart(). I have address below 
comments as per your suggestions



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-04-15 Thread via GitHub


ankitsol commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r3085912904


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -215,6 +277,12 @@ private void shipEdits(WALEntryBatch entryBatch) {
 entryBatch.getNbOperations(), (endTimeNs - startTimeNs) / 100);
 }
 break;
+  } catch (IOException ioe) {
+// Offset-Persist failure is treated as fatal to this shipper since it 
might come from
+// beforePersistingReplicationOffset. So abort and restart the 
Shipper, and WAL reading
+// will resume from the last successfully persisted offset
+abortAndRestart(ioe);

Review Comment:
   Good point, I’ve updated the code to rethrow the exception and handle 
abortAndRestart in the top-level run() loop



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-04-14 Thread via GitHub


anmolnar commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r3080258595


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##


Review Comment:
   I might be missing something, but to my understanding we don't restart the 
worker when we enter here, because that's the "finished" branch. Worker is 
already in finished state, because there's no more data process. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-04-14 Thread via GitHub


anmolnar commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r3080238529


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java:
##
@@ -866,4 +866,24 @@ public long getTotalReplicatedEdits() {
   long getSleepForRetries() {
 return sleepForRetries;
   }
+
+  void restartShipper(String walGroupId, ReplicationSourceShipper oldWorker) {
+workerThreads.compute(walGroupId, (key, current) -> {
+  if (current != oldWorker) {
+return current; // already replaced

Review Comment:
   Okay, we can leave this safety check here, but still unclear to me that if a 
single `walGroupId` is strictly handled by a single thread in the worker pool, 
how is it possible that single thread calls for restart two times?
   
   I think what you say "multiple threads trigger restart around the same time" 
doesn't seem to be possible, because what I mentioned above: exactly one thread 
is dealing with a `walGroupId`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-04-14 Thread via GitHub


anmolnar commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r3080238529


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java:
##
@@ -866,4 +866,24 @@ public long getTotalReplicatedEdits() {
   long getSleepForRetries() {
 return sleepForRetries;
   }
+
+  void restartShipper(String walGroupId, ReplicationSourceShipper oldWorker) {
+workerThreads.compute(walGroupId, (key, current) -> {
+  if (current != oldWorker) {
+return current; // already replaced

Review Comment:
   Okay, we can leave this safety check here, but still unclear to me that a 
single `walGroupId` is strictly handled by a single thread in the worker pool. 
How is it possible that single thread calls for restart two times?
   
   I think what you say "multiple threads trigger restart around the same time" 
doesn't seem to be possible, because what I mentioned above: exactly one thread 
is dealing with a `walGroupId`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-04-13 Thread via GitHub


ankitsol commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r3074515880


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java:
##
@@ -866,4 +866,24 @@ public long getTotalReplicatedEdits() {
   long getSleepForRetries() {
 return sleepForRetries;
   }
+
+  void restartShipper(String walGroupId, ReplicationSourceShipper oldWorker) {
+workerThreads.compute(walGroupId, (key, current) -> {
+  if (current != oldWorker) {
+return current; // already replaced

Review Comment:
   One scenario is concurrent restart attempts, very very rare chance though 
imho. For example, if the same worker encounters failure paths close together 
(or multiple threads trigger restart around the same time), one call to 
restartShipper may already replace the worker in workerThreads. When the second 
call executes, the current entry is no longer oldWorker, so we skip replacing 
it again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-04-09 Thread via GitHub


Apache9 commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r3044610064


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##


Review Comment:
   What I can recall is that for some reasons like updating replication config, 
we may terminate a replication source and recreate it before the replication 
source finishes, so maybe we do not always need to restart the worker when 
entering here, but I'm still a bit nervous that we miss some scenarios where we 
need to restart the worker if we only call abortAndRestart when catching an 
exception...



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -215,6 +277,12 @@ private void shipEdits(WALEntryBatch entryBatch) {
 entryBatch.getNbOperations(), (endTimeNs - startTimeNs) / 100);
 }
 break;
+  } catch (IOException ioe) {
+// Offset-Persist failure is treated as fatal to this shipper since it 
might come from
+// beforePersistingReplicationOffset. So abort and restart the 
Shipper, and WAL reading
+// will resume from the last successfully persisted offset
+abortAndRestart(ioe);

Review Comment:
   Better rethrow the exception to the top layer, and call abortAndRestart 
there? After returning from this method, we may still have other logics in the 
parent method and may cause inconsistency since the new shipper may already be 
started...



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java:
##
@@ -866,4 +866,24 @@ public long getTotalReplicatedEdits() {
   long getSleepForRetries() {
 return sleepForRetries;
   }
+
+  void restartShipper(String walGroupId, ReplicationSourceShipper oldWorker) {
+workerThreads.compute(walGroupId, (key, current) -> {
+  if (current != oldWorker) {
+return current; // already replaced
+  }
+
+  LOG.warn("Restarting shipper for walGroupId={}", walGroupId);
+
+  try {
+ReplicationSourceShipper newWorker = createNewShipper(walGroupId);
+startShipper(newWorker);
+return newWorker;
+  } catch (Exception e) {
+LOG.error("Failed to restart shipper for walGroupId={}", walGroupId, 
e);
+return current; // retry later

Review Comment:
   Checked che code, createNewShipper and startShipper both do not throw any 
exception, so do we really need to catch exception here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-04-08 Thread via GitHub


anmolnar commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-4210586624

   > Looking at this. I don't have additional comments, but waiting to approve 
until existing comments are resolved.
   
   Thanks @charlesconnell , much appreciated. 
   Patiently waiting for @Apache9 to comment on latest changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-04-08 Thread via GitHub


anmolnar commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-4210591233

   @taklwu @wchevreuil Would you guys like to update your previous reviews?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-04-06 Thread via GitHub


ankitsol commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r3043017939


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -215,6 +255,13 @@ private void shipEdits(WALEntryBatch entryBatch) {
 entryBatch.getNbOperations(), (endTimeNs - startTimeNs) / 100);
 }
 break;
+  } catch (IOException ioe) {
+// Offset-Persist failure is treated as fatal to this worker since it 
might come from

Review Comment:
   Hi @Apache9, just a gentle reminder to please take a look at the code review 
when you get a chance



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-03-31 Thread via GitHub


anmolnar commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r3016292503


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java:
##
@@ -866,4 +866,24 @@ public long getTotalReplicatedEdits() {
   long getSleepForRetries() {
 return sleepForRetries;
   }
+
+  void restartShipper(String walGroupId, ReplicationSourceShipper oldWorker) {
+workerThreads.compute(walGroupId, (key, current) -> {
+  if (current != oldWorker) {
+return current; // already replaced

Review Comment:
   How is this possible?



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java:
##
@@ -866,4 +866,24 @@ public long getTotalReplicatedEdits() {
   long getSleepForRetries() {
 return sleepForRetries;
   }
+
+  void restartShipper(String walGroupId, ReplicationSourceShipper oldWorker) {
+workerThreads.compute(walGroupId, (key, current) -> {
+  if (current != oldWorker) {
+return current; // already replaced
+  }
+
+  LOG.warn("Restarting shipper for walGroupId={}", walGroupId);
+
+  try {
+ReplicationSourceShipper newWorker = createNewShipper(walGroupId);

Review Comment:
   You might want to reuse `tryStartNewShipper(walGroupId)` here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-03-30 Thread via GitHub


taklwu commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r3012835303


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java:
##
@@ -866,4 +866,24 @@ public long getTotalReplicatedEdits() {
   long getSleepForRetries() {
 return sleepForRetries;
   }
+
+  void restartShipper(String walGroupId, ReplicationSourceShipper oldWorker) {
+workerThreads.compute(walGroupId, (key, current) -> {
+  if (current != oldWorker) {
+return current; // already replaced
+  }
+
+  LOG.warn("Restarting shipper for walGroupId={}", walGroupId);
+
+  try {
+ReplicationSourceShipper newWorker = createNewShipper(walGroupId);
+startShipper(newWorker);
+return newWorker;
+  } catch (Exception e) {
+LOG.error("Failed to restart shipper for walGroupId={}", walGroupId, 
e);
+return current; // retry later

Review Comment:
   can you check if this comment is right? I used the cursor AI editor also 
confirmed this may be a potential bug.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-03-30 Thread via GitHub


Copilot commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r3012357137


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +297,41 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+if (accumulatedSizeSinceLastUpdate == 0 || lastShippedBatch == null) {
+  return false;
+}
+
+// Default behaviour to update offset immediately after replicate()
+if (offsetUpdateSizeThresholdBytes == -1 && offsetUpdateIntervalMs == 
Long.MAX_VALUE) {
+  return true;
+}
+
+return (accumulatedSizeSinceLastUpdate >= offsetUpdateSizeThresholdBytes)
+  || (EnvironmentEdgeManager.currentTime() - lastOffsetUpdateTime >= 
offsetUpdateIntervalMs);

Review Comment:
   `shouldPersistLogPosition` treats `offsetUpdateSizeThresholdBytes == -1` as 
a real threshold, so `accumulatedSizeSinceLastUpdate >= -1` will be true 
whenever any data has been shipped. This makes it impossible to enable 
*time-based only* persistence by setting just 
`hbase.replication.shipper.offset.update.interval.ms` (you will still persist 
immediately after every batch). Consider treating negative/zero thresholds as 
"disabled" for size-based persistence and only applying the size comparison 
when the configured threshold is > 0.
   ```suggestion
   // Treat non-positive size thresholds as "disabled" for size-based 
persistence.
   boolean sizeThresholdReached = offsetUpdateSizeThresholdBytes > 0
 && accumulatedSizeSinceLastUpdate >= offsetUpdateSizeThresholdBytes;
   boolean timeThresholdReached =
 EnvironmentEdgeManager.currentTime() - lastOffsetUpdateTime >= 
offsetUpdateIntervalMs;
   
   return sizeThresholdReached || timeThresholdReached;
   ```



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -118,11 +154,24 @@ public final void run() {
 } else {
   shipEdits(entryBatch);
 }

Review Comment:
   With deferred offset persistence enabled (size/interval thresholds), 
receiving `WALEntryBatch.NO_MORE_DATA` can transition the worker to `FINISHED` 
without ever calling `persistLogPosition()` for the last shipped batch. This 
can drop the final offset update and any deferred HFile-ref cleanup, causing 
re-replication/recovery work. Consider persisting (if 
`accumulatedSizeSinceLastUpdate > 0`) before calling `noMoreData()` / before 
setting state to `FINISHED`.



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -190,13 +252,13 @@ private void shipEdits(WALEntryBatch entryBatch) {
 } else {
   sleepMultiplier = Math.max(sleepMultiplier - 1, 0);
 }
-// Clean up hfile references
-for (Entry entry : entries) {
-  cleanUpHFileRefs(entry.getEdit());
-  LOG.trace("shipped entry {}: ", entry);
+
+accumulatedSizeSinceLastUpdate += currentSize;
+entriesForCleanUpHFileRefs.addAll(entries);

Review Comment:
   `entriesForCleanUpHFileRefs.addAll(entries)` retains full `WAL.Entry` 
objects across batches until the next offset persist. With large 
thresholds/intervals this can hold a large amount of WAL data in memory 
(including edits/cells) even though cleanup only needs bulk-load-related 
metadata. Consider recording only the minimal information needed for cleanup 
(e.g., extracted bulk-load store file refs) or only retaining entries that 
actually contain `WALEdit.BULK_LOAD` cells.
   ```suggestion
   // Only retain WAL entries that contain bulk load cells for later 
HFile ref cleanup
   for (Entry entry : entries) {
 WALEdit edit = entry.getEdit();
 if (edit == null) {
   continue;
 }
 for (Cell cell : edit.getCells()) {
   if (CellUtil.matchFamily(cell, WALEdit.BULK_LOAD)) {
 entriesForCleanUpHFileRefs.add(entry);
 break;
   }
 }
   }
   ```



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationEndpoint.java:
##
@@ -283,4 +283,11 @@ public int getTimeout() {
* @throws IllegalStateException if this service's state isn't FAILED.
*/
   Throwable failureCause();
+
+  /**
+   * Hook invoked before persisting replication offsets. Eg: Buffered 
endpoints can flush/close WALs

Review Comment:
   Javadoc typo: use "e.g.," instead of "Eg:".
   ```suggestion
  * Hook invoked before persisting replication offsets, e.g., buffered 
endpoints can flush/close WALs
   ```



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java:
##
@@ -866,4 +866,24 @@ public long getTotalReplicatedEdits() {
   long getSleepF

Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-03-30 Thread via GitHub


ankitsol commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r3011386961


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -215,6 +255,13 @@ private void shipEdits(WALEntryBatch entryBatch) {
 entryBatch.getNbOperations(), (endTimeNs - startTimeNs) / 100);
 }
 break;
+  } catch (IOException ioe) {
+// Offset-Persist failure is treated as fatal to this worker since it 
might come from

Review Comment:
   Hi @Apache9 just a gentle reminder on this new commit when you get a moment. 
Thank you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-03-26 Thread via GitHub


ankitsol commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-4134778608

   Failiing test 
`org.apache.hadoop.hbase.master.procedure.TestSnapshotProcedureRSCrashes.testRegionServerCrashWhileVerifyingSnapshot`
 is unrelated to this change


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-03-26 Thread via GitHub


ankitsol commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2994837149


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -90,6 +97,10 @@ public ReplicationSourceShipper(Configuration conf, String 
walGroupId, Replicati
   this.conf.getInt("replication.source.getEntries.timeout", 
DEFAULT_TIMEOUT);
 this.shipEditsTimeout = 
this.conf.getInt(HConstants.REPLICATION_SOURCE_SHIPEDITS_TIMEOUT,
   HConstants.REPLICATION_SOURCE_SHIPEDITS_TIMEOUT_DFAULT);
+this.offsetUpdateIntervalMs =
+  conf.getLong("hbase.replication.shipper.offset.update.interval.ms", 
Long.MAX_VALUE);
+this.offsetUpdateSizeThresholdBytes =
+  conf.getLong("hbase.replication.shipper.offset.update.size.threshold", 
-1L);

Review Comment:
   Extracted these constants as a static final field and added defaults too



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-03-26 Thread via GitHub


ankitsol commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2994846425


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -106,9 +117,25 @@ public final void run() {
 continue;
   }
   try {
-WALEntryBatch entryBatch = entryReader.poll(getEntriesTimeout);
+// check time-based offset persistence
+if (shouldPersistLogPosition()) {
+  // Trigger offset persistence via existing retry/backoff mechanism 
in shipEdits()
+  WALEntryBatch emptyBatch = createEmptyBatchForTimeBasedFlush();
+  if (emptyBatch != null) shipEdits(emptyBatch);

Review Comment:
   I was thinking to add tests to the final PR against our continuous backup 
feature branch once I get approval on code implementation



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-03-26 Thread via GitHub


ankitsol commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2994831828


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -215,6 +255,13 @@ private void shipEdits(WALEntryBatch entryBatch) {
 entryBatch.getNbOperations(), (endTimeNs - startTimeNs) / 100);
 }
 break;
+  } catch (IOException ioe) {
+// Offset-Persist failure is treated as fatal to this worker since it 
might come from

Review Comment:
   > Better introduce a abortAndRestart method in ReplicationSourceShipper to 
call a method in ReplicationSource to replace the current shipper with a new 
one. This can avoid introducing a new monitor thread. And also, we can deal 
with InterruptedException with the same way, as usually if we want to shutdown, 
we will first change a running or close flag and then interrupt, so when we 
want to reschedule, we can check the flag to determine whether we should quit 
now.
   
   @Apache9 @anmolnar @taklwu I have implemented this approach, please help me 
with review these code changes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-03-16 Thread via GitHub


Apache9 commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2943848541


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -90,6 +97,10 @@ public ReplicationSourceShipper(Configuration conf, String 
walGroupId, Replicati
   this.conf.getInt("replication.source.getEntries.timeout", 
DEFAULT_TIMEOUT);
 this.shipEditsTimeout = 
this.conf.getInt(HConstants.REPLICATION_SOURCE_SHIPEDITS_TIMEOUT,
   HConstants.REPLICATION_SOURCE_SHIPEDITS_TIMEOUT_DFAULT);
+this.offsetUpdateIntervalMs =
+  conf.getLong("hbase.replication.shipper.offset.update.interval.ms", 
Long.MAX_VALUE);
+this.offsetUpdateSizeThresholdBytes =
+  conf.getLong("hbase.replication.shipper.offset.update.size.threshold", 
-1L);

Review Comment:
   HConstants is IA.Public so generally we do not want to add new fields to it 
unless we can confirm they should be accessed by user code directly. But I 
agree that we should extract these constants as a static final field.



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -215,6 +255,13 @@ private void shipEdits(WALEntryBatch entryBatch) {
 entryBatch.getNbOperations(), (endTimeNs - startTimeNs) / 100);
 }
 break;
+  } catch (IOException ioe) {
+// Offset-Persist failure is treated as fatal to this worker since it 
might come from

Review Comment:
   Better introduce a abortAndRestart method in ReplicationSourceShipper to 
call a method in ReplicationSource to replace the current shipper with a new 
one. This can avoid introducing a new monitor thread. And also, we can deal 
with InterruptedException with the same way, as usually if we want to shutdown, 
we will first change a running or close flag and then interrupt, so when we 
want to reschedule, we can check the flag to determine whether we should quit 
now.



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java:
##
@@ -148,6 +151,12 @@ public class ReplicationSource implements 
ReplicationSourceInterface {
   public static final int DEFAULT_WAIT_ON_ENDPOINT_SECONDS = 30;
   private int waitOnEndpointSeconds = -1;
 
+  public static final String SHIPPER_MONITOR_INTERVAL =

Review Comment:
   Since the code in shipper is all implemented by us, we can just reschedule a 
new shipper when it is dead? I mean, we do not need a monitor thread to do this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-03-16 Thread via GitHub


taklwu commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2943368507


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -90,6 +97,10 @@ public ReplicationSourceShipper(Configuration conf, String 
walGroupId, Replicati
   this.conf.getInt("replication.source.getEntries.timeout", 
DEFAULT_TIMEOUT);
 this.shipEditsTimeout = 
this.conf.getInt(HConstants.REPLICATION_SOURCE_SHIPEDITS_TIMEOUT,
   HConstants.REPLICATION_SOURCE_SHIPEDITS_TIMEOUT_DFAULT);
+this.offsetUpdateIntervalMs =
+  conf.getLong("hbase.replication.shipper.offset.update.interval.ms", 
Long.MAX_VALUE);
+this.offsetUpdateSizeThresholdBytes =
+  conf.getLong("hbase.replication.shipper.offset.update.size.threshold", 
-1L);

Review Comment:
   nit: why isn't this feature flag added in `HConstants` ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-03-16 Thread via GitHub


taklwu commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2943109855


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -106,9 +117,25 @@ public final void run() {
 continue;
   }
   try {
-WALEntryBatch entryBatch = entryReader.poll(getEntriesTimeout);
+// check time-based offset persistence
+if (shouldPersistLogPosition()) {
+  // Trigger offset persistence via existing retry/backoff mechanism 
in shipEdits()
+  WALEntryBatch emptyBatch = createEmptyBatchForTimeBasedFlush();
+  if (emptyBatch != null) shipEdits(emptyBatch);

Review Comment:
   nit better to keep bracket even if this is a one-liner
   
   ```suggestion
 if (emptyBatch != null) {
   shipEdits(emptyBatch);
 }
   ```



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -154,15 +196,16 @@ protected void postFinish() {
   private void shipEdits(WALEntryBatch entryBatch) {
 List entries = entryBatch.getWalEntries();
 int sleepMultiplier = 0;
-if (entries.isEmpty()) {
-  updateLogPosition(entryBatch);
-  return;
-}
 int currentSize = (int) entryBatch.getHeapSize();
 source.getSourceMetrics()
   .setTimeStampNextToReplicate(entries.get(entries.size() - 
1).getKey().getWriteTime());
 while (isActive()) {
   try {
+if (entries.isEmpty()) {

Review Comment:
   @ankitsol is this align/update in the latest patch ?



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -215,6 +255,13 @@ private void shipEdits(WALEntryBatch entryBatch) {
 entryBatch.getNbOperations(), (endTimeNs - startTimeNs) / 100);
 }
 break;
+  } catch (IOException ioe) {
+// Offset-Persist failure is treated as fatal to this worker since it 
might come from
+// beforePersistingReplicationOffset.
+// ReplicationSource will restart the Shipper, and WAL reading
+// will resume from the last successfully persisted offset
+throw new ReplicationRuntimeException(

Review Comment:
   k, it looks like the previous updateLogPosition issue 
https://github.com/apache/hbase/pull/7617/changes#r2802692158  has been handled 
here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-03-10 Thread via GitHub


anmolnar commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-4031807503

   @ankitsol Have you checked the CI output? I see a lot of unit test failures 
related to replication.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-03-09 Thread via GitHub


ankitsol commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-402300

   @Apache9 just a gentle ping on my previous comment to check if I'm heading 
in the right direction. Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-03-04 Thread via GitHub


ankitsol commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3996883509

   @Apache9 @anmolnar 
   
   I have now added a periodic shipper monitoring task in ReplicationSource 
(`shipperMonitorExecutor`) which checks for dead (non-FINISHED) shipper threads 
and recreates them. When the new shipper starts, it reads the replication 
offset from the persisted offset storage and resumes WAL reading from that 
point, so any WAL entries whose offset was not successfully persisted will be 
replicated again.
   
   So now if the endpoint (for example an S3-based implementation) fails during 
`beforePersistingReplicationOffset()` while committing/flushing data, 
`persistLogPosition()` throws an `IOException`. In `shipEdits()` this 
`IOException` is caught and rethrown as a `ReplicationRuntimeException`. This 
causes the ReplicationSourceShipper worker thread to exit (the run() method 
interrupts and terminates the thread). The monitor then detects the dead worker 
and recreates the shipper, which resumes replication from the last persisted 
offset.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-02-23 Thread via GitHub


Apache9 commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2844329304


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -154,15 +196,16 @@ protected void postFinish() {
   private void shipEdits(WALEntryBatch entryBatch) {
 List entries = entryBatch.getWalEntries();
 int sleepMultiplier = 0;
-if (entries.isEmpty()) {
-  updateLogPosition(entryBatch);
-  return;
-}
 int currentSize = (int) entryBatch.getHeapSize();
 source.getSourceMetrics()
   .setTimeStampNextToReplicate(entries.get(entries.size() - 
1).getKey().getWriteTime());
 while (isActive()) {
   try {
+if (entries.isEmpty()) {

Review Comment:
   I'm not sure if the current logic in ReplicationSource can handle shipper 
abort, but I think this is a possible way to deal with the problem.
   When restarting, we read from the external replication offset storage and 
restart from the offset.
   I think this could also be used to deal with the updateLogPosition 
exception, so we do not need to abort the whole region server.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-02-23 Thread via GitHub


ankitsol commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2840781191


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -154,15 +196,16 @@ protected void postFinish() {
   private void shipEdits(WALEntryBatch entryBatch) {
 List entries = entryBatch.getWalEntries();
 int sleepMultiplier = 0;
-if (entries.isEmpty()) {
-  updateLogPosition(entryBatch);
-  return;
-}
 int currentSize = (int) entryBatch.getHeapSize();
 source.getSourceMetrics()
   .setTimeStampNextToReplicate(entries.get(entries.size() - 
1).getKey().getWriteTime());
 while (isActive()) {
   try {
+if (entries.isEmpty()) {

Review Comment:
   Hi @Apache9, just a gentle ping on my previous comment to check if I'm 
heading in the right direction. Thank you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-02-19 Thread via GitHub


ankitsol commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2827260596


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -154,15 +196,16 @@ protected void postFinish() {
   private void shipEdits(WALEntryBatch entryBatch) {
 List entries = entryBatch.getWalEntries();
 int sleepMultiplier = 0;
-if (entries.isEmpty()) {
-  updateLogPosition(entryBatch);
-  return;
-}
 int currentSize = (int) entryBatch.getHeapSize();
 source.getSourceMetrics()
   .setTimeStampNextToReplicate(entries.get(entries.size() - 
1).getKey().getWriteTime());
 while (isActive()) {
   try {
+if (entries.isEmpty()) {

Review Comment:
   run() method has this catch block to handle `ReplicationRuntimeException`
   
   ```
   catch (InterruptedException | ReplicationRuntimeException e) {
   // It is interrupted and needs to quit.
   LOG.warn("Interrupted while waiting for next replication entry 
batch", e);
   Thread.currentThread().interrupt();
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-02-19 Thread via GitHub


ankitsol commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2827254931


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -154,15 +196,16 @@ protected void postFinish() {
   private void shipEdits(WALEntryBatch entryBatch) {
 List entries = entryBatch.getWalEntries();
 int sleepMultiplier = 0;
-if (entries.isEmpty()) {
-  updateLogPosition(entryBatch);
-  return;
-}
 int currentSize = (int) entryBatch.getHeapSize();
 source.getSourceMetrics()
   .setTimeStampNextToReplicate(entries.get(entries.size() - 
1).getKey().getWriteTime());
 while (isActive()) {
   try {
+if (entries.isEmpty()) {

Review Comment:
   Currently in `shipEdits()`, the only place where an IOException can be 
raised is from `persistLogPosition()` (via 
`beforePersistingReplicationOffset()` or `cleanUpHFileRefs()`).
   
   Based on your earlier comment about restarting from the last persisted 
offset and resending WAL entries, I am thinking of splitting the existing catch 
block into two parts:
   
   ```
   catch (IOException ioe) {
 // Persist failure → fatal → restart worker → WAL replay from last 
persisted offset
 throw new ReplicationRuntimeException(
 "Failed to persist replication offset; restarting worker to replay 
WAL", ioe);
   
   } catch (Exception ex) {
 // Replication/transient failure → retry current batch (existing behaviour)
 source.getSourceMetrics().incrementFailedBatches();
 LOG.warn("{} threw unknown exception:",
   source.getReplicationEndpoint().getClass().getName(), ex);
   
 if (sleepForRetries("ReplicationEndpoint threw exception",
 sleepForRetries, sleepMultiplier, maxRetriesMultiplier)) {
   sleepMultiplier++;
 }
   }
   ```
   
   
   The intention is that propagating `ReplicationRuntimeException` will cause 
the shipper worker to exit, and ReplicationSource will recreate it, so WAL 
reading resumes from the last persisted offset and all batches since then are 
resent.
   
   Does this approach match what you had in mind? @Apache9 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-02-17 Thread via GitHub


anmolnar commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2819270897


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -154,15 +196,16 @@ protected void postFinish() {
   private void shipEdits(WALEntryBatch entryBatch) {
 List entries = entryBatch.getWalEntries();
 int sleepMultiplier = 0;
-if (entries.isEmpty()) {
-  updateLogPosition(entryBatch);
-  return;
-}
 int currentSize = (int) entryBatch.getHeapSize();
 source.getSourceMetrics()
   .setTimeStampNextToReplicate(entries.get(entries.size() - 
1).getKey().getWriteTime());
 while (isActive()) {
   try {
+if (entries.isEmpty()) {

Review Comment:
   Can we just abort this shipper?
   Is `ReplicationSource` going to create a new one if it's aborted?
   In that case it will pick up at the last persisted position and retry 
correctly, won't it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-02-17 Thread via GitHub


anmolnar commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2819270897


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -154,15 +196,16 @@ protected void postFinish() {
   private void shipEdits(WALEntryBatch entryBatch) {
 List entries = entryBatch.getWalEntries();
 int sleepMultiplier = 0;
-if (entries.isEmpty()) {
-  updateLogPosition(entryBatch);
-  return;
-}
 int currentSize = (int) entryBatch.getHeapSize();
 source.getSourceMetrics()
   .setTimeStampNextToReplicate(entries.get(entries.size() - 
1).getKey().getWriteTime());
 while (isActive()) {
   try {
+if (entries.isEmpty()) {

Review Comment:
   Can we just abort this shipper?
   Is `ReplicationSource` going to create a new one if it's aborted?
   In that case it will pick up at last persisted position and retry correctly, 
won't it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-02-17 Thread via GitHub


ankitsol commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2817632340


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +272,41 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+if (accumulatedSizeSinceLastUpdate == 0 || lastShippedBatch == null) {
+  return false;
+}
+
+// Default behaviour to update offset immediately after replicate()
+if (offsetUpdateSizeThresholdBytes == -1 && offsetUpdateIntervalMs == 
Long.MAX_VALUE) {

Review Comment:
   Yes, this is not needed, I just added here to make this default behaviour 
explicit



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-02-17 Thread via GitHub


ankitsol commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2817553228


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -90,6 +99,22 @@ public ReplicationSourceShipper(Configuration conf, String 
walGroupId, Replicati
   this.conf.getInt("replication.source.getEntries.timeout", 
DEFAULT_TIMEOUT);
 this.shipEditsTimeout = 
this.conf.getInt(HConstants.REPLICATION_SOURCE_SHIPEDITS_TIMEOUT,
   HConstants.REPLICATION_SOURCE_SHIPEDITS_TIMEOUT_DFAULT);
+ReplicationPeer peer = source.getPeer();
+if (peer != null && peer.getPeerConfig() != null) {

Review Comment:
   Thank you for pointing this, in ReplicationSourceManager.java this merge of 
peer config happens



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-02-12 Thread via GitHub


Apache9 commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2802689035


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +272,41 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+if (accumulatedSizeSinceLastUpdate == 0 || lastShippedBatch == null) {
+  return false;
+}
+
+// Default behaviour to update offset immediately after replicate()
+if (offsetUpdateSizeThresholdBytes == -1 && offsetUpdateIntervalMs == 
Long.MAX_VALUE) {

Review Comment:
   This is a bit strange...
   
   If offsetUpdateSizeThresholdBytes is -1, then the below 
accumulatedSizeSinceLastUpdate >= offsetUpdateSizeThresholdBytes will always 
returns true, so we do not need this check here?



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -105,10 +130,17 @@ public final void run() {
 sleepForRetries("Replication is disabled", sleepForRetries, 1, 
maxRetriesMultiplier);
 continue;
   }
+  // check time-based offset persistence
+  if (shouldPersistLogPosition()) {

Review Comment:
   So this wants to solve the problem that, since now we may not always persist 
the offset and shipping, it is possible that we shipped out a batch, and then 
did not get a new batch for a very long time, so if we do not add a check here, 
there is no way for us to persist the offset?
   
   I think we also need to change the poll timeout below, we should not wait 
longer than the remaining time of update offset interval?



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -90,6 +99,22 @@ public ReplicationSourceShipper(Configuration conf, String 
walGroupId, Replicati
   this.conf.getInt("replication.source.getEntries.timeout", 
DEFAULT_TIMEOUT);
 this.shipEditsTimeout = 
this.conf.getInt(HConstants.REPLICATION_SOURCE_SHIPEDITS_TIMEOUT,
   HConstants.REPLICATION_SOURCE_SHIPEDITS_TIMEOUT_DFAULT);
+ReplicationPeer peer = source.getPeer();
+if (peer != null && peer.getPeerConfig() != null) {

Review Comment:
   The conf here already contains all the configurations in peer config. You 
can see the implementation of ReplicationPeers for more details, we will create 
a CompoundConfiguration and apply the peerConfig.getConfiguration() to it.



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -154,15 +196,16 @@ protected void postFinish() {
   private void shipEdits(WALEntryBatch entryBatch) {
 List entries = entryBatch.getWalEntries();
 int sleepMultiplier = 0;
-if (entries.isEmpty()) {
-  updateLogPosition(entryBatch);
-  return;
-}
 int currentSize = (int) entryBatch.getHeapSize();
 source.getSourceMetrics()
   .setTimeStampNextToReplicate(entries.get(entries.size() - 
1).getKey().getWriteTime());
 while (isActive()) {
   try {
+if (entries.isEmpty()) {

Review Comment:
   Why move this here?



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -154,15 +196,16 @@ protected void postFinish() {
   private void shipEdits(WALEntryBatch entryBatch) {
 List entries = entryBatch.getWalEntries();
 int sleepMultiplier = 0;
-if (entries.isEmpty()) {
-  updateLogPosition(entryBatch);
-  return;
-}
 int currentSize = (int) entryBatch.getHeapSize();
 source.getSourceMetrics()
   .setTimeStampNextToReplicate(entries.get(entries.size() - 
1).getKey().getWriteTime());
 while (isActive()) {
   try {
+if (entries.isEmpty()) {

Review Comment:
   OK, I guess I know why you move this here, in the past updateLogPosition can 
not fail(it will lead to an abort so we can assume it never fails), but now 
since we have introduced a callback method, it could fail.
   
   Then I do not think this is the correct way to deal with this. Consider the 
S3 based solution, if you fail to commit the file on S3, then the correct way 
is to send the data again? But here we just retry committing... I think we 
should restart from the previous persist offset and send data again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-02-12 Thread via GitHub


Apache9 commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3894748231

   Will take a look this afternoon.
   
   Please give me some time.
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-02-10 Thread via GitHub


anmolnar commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2789440736


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +247,41 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();
+long maxBufferSize = endpoint.getMaxBufferSize();
+if (stagedWalSize == 0 || lastShippedBatch == null) {
+  return false;
+}
+if (maxBufferSize == -1) {
+  return true;
+}
+return stagedWalSize >= maxBufferSize
+  || (EnvironmentEdgeManager.currentTime() - lastStagedFlushTs >= 
endpoint.maxFlushInterval());
+  }
+
+  private void persistLogPosition() throws IOException {
+if (lastShippedBatch == null) {

Review Comment:
   We cannot. This method is not getting called if lastShippedBatch is null in 
the current implementation. We can still keep this check here for safety 
reasons.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-02-06 Thread via GitHub


Apache-HBase commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3860012449

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 37s |  |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --author-ignore-list 
--blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck  |
    _ Prechecks _ |
    _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 34s |  |  master passed  |
   | +1 :green_heart: |  compile  |   0m 46s |  |  master passed  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  |  master passed  |
   | +1 :green_heart: |  shadedjars  |   4m 26s |  |  branch has no errors when 
building our shaded downstream artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 47s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 47s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 21s |  |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   4m 21s |  |  patch has no errors when 
building our shaded downstream artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  28m 34s | 
[/patch-unit-hbase-server.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/8/artifact/yetus-jdk17-hadoop3-check/output/patch-unit-hbase-server.txt)
 |  hbase-server in the patch failed.  |
   |  |   |  46m 44s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.53 ServerAPI=1.53 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/8/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/7617 |
   | Optional Tests | javac javadoc unit compile shadedjars |
   | uname | Linux fda2279b6c5b 6.14.0-1018-aws #18~24.04.1-Ubuntu SMP Mon Nov 
24 19:46:27 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 640308bf96121abbabd3f9fcad9ba094699af23d |
   | Default Java | Eclipse Adoptium-17.0.11+9 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/8/testReport/
 |
   | Max. process+thread count | 1419 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/8/console 
|
   | versions | git=2.34.1 maven=3.9.8 |
   | Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-02-06 Thread via GitHub


Apache-HBase commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3859957472

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 11s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  |  Patch does not have any 
anti-patterns.  |
    _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 55s |  |  master passed  |
   | +1 :green_heart: |  compile  |   2m 54s |  |  master passed  |
   | +1 :green_heart: |  checkstyle  |   0m 51s |  |  master passed  |
   | +1 :green_heart: |  spotbugs  |   1m 33s |  |  master passed  |
   | +1 :green_heart: |  spotless  |   0m 46s |  |  branch has no errors when 
running spotless:check.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 30s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 56s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 56s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 48s | 
[/results-checkstyle-hbase-server.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/8/artifact/yetus-general-check/output/results-checkstyle-hbase-server.txt)
 |  hbase-server: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total 
(was 0)  |
   | +1 :green_heart: |  spotbugs  |   1m 42s |  |  the patch passed  |
   | +1 :green_heart: |  hadoopcheck  |   9m 21s |  |  Patch does not cause any 
errors with Hadoop 3.3.6 3.4.1.  |
   | +1 :green_heart: |  spotless  |   0m 39s |  |  patch has no errors when 
running spotless:check.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 10s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  33m 19s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.53 ServerAPI=1.53 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/8/artifact/yetus-general-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/7617 |
   | Optional Tests | dupname asflicense javac spotbugs checkstyle codespell 
detsecrets compile hadoopcheck hbaseanti spotless |
   | uname | Linux 867472c88f7c 6.14.0-1018-aws #18~24.04.1-Ubuntu SMP Mon Nov 
24 19:46:27 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 640308bf96121abbabd3f9fcad9ba094699af23d |
   | Default Java | Eclipse Adoptium-17.0.11+9 |
   | Max. process+thread count | 85 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/8/console 
|
   | versions | git=2.34.1 maven=3.9.8 spotbugs=4.7.3 |
   | Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-02-05 Thread via GitHub


Apache-HBase commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3858344435

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 32s |  |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  5s |  |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --author-ignore-list 
--blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck  |
    _ Prechecks _ |
    _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 38s |  |  master passed  |
   | +1 :green_heart: |  compile  |   1m  1s |  |  master passed  |
   | +1 :green_heart: |  javadoc  |   0m 33s |  |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 57s |  |  branch has no errors when 
building our shaded downstream artifacts.  |
    _ Patch Compile Tests _ |
   | -1 :x: |  mvninstall  |   1m 44s | 
[/patch-mvninstall-root.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/7/artifact/yetus-jdk17-hadoop3-check/output/patch-mvninstall-root.txt)
 |  root in the patch failed.  |
   | -1 :x: |  compile  |   0m 59s | 
[/patch-compile-hbase-server.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/7/artifact/yetus-jdk17-hadoop3-check/output/patch-compile-hbase-server.txt)
 |  hbase-server in the patch failed.  |
   | -0 :warning: |  javac  |   0m 59s | 
[/patch-compile-hbase-server.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/7/artifact/yetus-jdk17-hadoop3-check/output/patch-compile-hbase-server.txt)
 |  hbase-server in the patch failed.  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  the patch passed  |
   | -1 :x: |  shadedjars  |   4m 29s |  |  patch has 16 errors when building 
our shaded downstream artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   1m  0s | 
[/patch-unit-hbase-server.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/7/artifact/yetus-jdk17-hadoop3-check/output/patch-unit-hbase-server.txt)
 |  hbase-server in the patch failed.  |
   |  |   |  21m 27s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/7/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/7617 |
   | Optional Tests | javac javadoc unit compile shadedjars |
   | uname | Linux d235f58c2160 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 
23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / cfc9d908eb46fd943e83f0f6288d1eecc3ce16ad |
   | Default Java | Eclipse Adoptium-17.0.11+9 |
   | shadedjars | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/7/artifact/yetus-jdk17-hadoop3-check/output/patch-shadedjars.txt
 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/7/testReport/
 |
   | Max. process+thread count | 80 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/7/console 
|
   | versions | git=2.34.1 maven=3.9.8 |
   | Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-02-05 Thread via GitHub


Apache-HBase commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3858309287

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 12s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  |  Patch does not have any 
anti-patterns.  |
    _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 15s |  |  master passed  |
   | +1 :green_heart: |  compile  |   2m 34s |  |  master passed  |
   | +1 :green_heart: |  checkstyle  |   0m 46s |  |  master passed  |
   | +1 :green_heart: |  spotbugs  |   1m 14s |  |  master passed  |
   | +1 :green_heart: |  spotless  |   0m 34s |  |  branch has no errors when 
running spotless:check.  |
    _ Patch Compile Tests _ |
   | -1 :x: |  mvninstall  |   1m 15s | 
[/patch-mvninstall-root.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/7/artifact/yetus-general-check/output/patch-mvninstall-root.txt)
 |  root in the patch failed.  |
   | -1 :x: |  compile  |   1m 24s | 
[/patch-compile-hbase-server.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/7/artifact/yetus-general-check/output/patch-compile-hbase-server.txt)
 |  hbase-server in the patch failed.  |
   | -0 :warning: |  javac  |   1m 24s | 
[/patch-compile-hbase-server.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/7/artifact/yetus-general-check/output/patch-compile-hbase-server.txt)
 |  hbase-server in the patch failed.  |
   | +1 :green_heart: |  blanks  |   0m  1s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 45s | 
[/results-checkstyle-hbase-server.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/7/artifact/yetus-general-check/output/results-checkstyle-hbase-server.txt)
 |  hbase-server: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total 
(was 0)  |
   | -1 :x: |  spotbugs  |   0m 38s | 
[/patch-spotbugs-hbase-server.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/7/artifact/yetus-general-check/output/patch-spotbugs-hbase-server.txt)
 |  hbase-server in the patch failed.  |
   | -1 :x: |  hadoopcheck  |   1m 28s |  |  The patch causes 16 errors with 
Hadoop v3.3.6.  |
   | -1 :x: |  hadoopcheck  |   2m 57s |  |  The patch causes 16 errors with 
Hadoop v3.4.1.  |
   | +1 :green_heart: |  spotless  |   0m 33s |  |  patch has no errors when 
running spotless:check.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m  6s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  16m 26s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.53 ServerAPI=1.53 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/7/artifact/yetus-general-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/7617 |
   | Optional Tests | dupname asflicense javac spotbugs checkstyle codespell 
detsecrets compile hadoopcheck hbaseanti spotless |
   | uname | Linux 8c4a7b145330 6.14.0-1018-aws #18~24.04.1-Ubuntu SMP Mon Nov 
24 19:46:27 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / cfc9d908eb46fd943e83f0f6288d1eecc3ce16ad |
   | Default Java | Eclipse Adoptium-17.0.11+9 |
   | hadoopcheck | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/7/artifact/yetus-general-check/output/patch-javac-3.3.6.txt
 |
   | hadoopcheck | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/7/artifact/yetus-general-check/output/patch-javac-3.4.1.txt
 |
   | Max. process+thread count | 85 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/7/console 
|
   | versions | git=2.34.1 maven=3.9.8 spotbugs=4.7.3 |
   | Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-02-05 Thread via GitHub


Apache-HBase commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3853691977

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 11s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  |  Patch does not have any 
anti-patterns.  |
    _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 18s |  |  master passed  |
   | +1 :green_heart: |  compile  |   2m 38s |  |  master passed  |
   | +1 :green_heart: |  checkstyle  |   0m 46s |  |  master passed  |
   | +1 :green_heart: |  spotbugs  |   1m 14s |  |  master passed  |
   | +1 :green_heart: |  spotless  |   0m 35s |  |  branch has no errors when 
running spotless:check.  |
    _ Patch Compile Tests _ |
   | -1 :x: |  mvninstall  |   1m 15s | 
[/patch-mvninstall-root.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/6/artifact/yetus-general-check/output/patch-mvninstall-root.txt)
 |  root in the patch failed.  |
   | -1 :x: |  compile  |   1m 23s | 
[/patch-compile-hbase-server.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/6/artifact/yetus-general-check/output/patch-compile-hbase-server.txt)
 |  hbase-server in the patch failed.  |
   | -0 :warning: |  javac  |   1m 23s | 
[/patch-compile-hbase-server.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/6/artifact/yetus-general-check/output/patch-compile-hbase-server.txt)
 |  hbase-server in the patch failed.  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 45s | 
[/results-checkstyle-hbase-server.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/6/artifact/yetus-general-check/output/results-checkstyle-hbase-server.txt)
 |  hbase-server: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total 
(was 0)  |
   | -1 :x: |  spotbugs  |   0m 39s | 
[/patch-spotbugs-hbase-server.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/6/artifact/yetus-general-check/output/patch-spotbugs-hbase-server.txt)
 |  hbase-server in the patch failed.  |
   | -1 :x: |  hadoopcheck  |   1m 29s |  |  The patch causes 16 errors with 
Hadoop v3.3.6.  |
   | -1 :x: |  hadoopcheck  |   2m 57s |  |  The patch causes 16 errors with 
Hadoop v3.4.1.  |
   | +1 :green_heart: |  spotless  |   0m 33s |  |  patch has no errors when 
running spotless:check.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m  6s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  16m 27s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.53 ServerAPI=1.53 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/6/artifact/yetus-general-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/7617 |
   | Optional Tests | dupname asflicense javac spotbugs checkstyle codespell 
detsecrets compile hadoopcheck hbaseanti spotless |
   | uname | Linux 5ca63016945b 6.14.0-1018-aws #18~24.04.1-Ubuntu SMP Mon Nov 
24 19:46:27 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / f7e81880f2fb14a4277b9ce69ca597e82e3b7854 |
   | Default Java | Eclipse Adoptium-17.0.11+9 |
   | hadoopcheck | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/6/artifact/yetus-general-check/output/patch-javac-3.3.6.txt
 |
   | hadoopcheck | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/6/artifact/yetus-general-check/output/patch-javac-3.4.1.txt
 |
   | Max. process+thread count | 84 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/6/console 
|
   | versions | git=2.34.1 maven=3.9.8 spotbugs=4.7.3 |
   | Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-02-05 Thread via GitHub


Apache-HBase commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3853686338

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 12s |  |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  4s |  |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --author-ignore-list 
--blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck  |
    _ Prechecks _ |
    _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 37s |  |  master passed  |
   | +1 :green_heart: |  compile  |   0m 47s |  |  master passed  |
   | +1 :green_heart: |  javadoc  |   0m 23s |  |  master passed  |
   | +1 :green_heart: |  shadedjars  |   4m 28s |  |  branch has no errors when 
building our shaded downstream artifacts.  |
    _ Patch Compile Tests _ |
   | -1 :x: |  mvninstall  |   1m 16s | 
[/patch-mvninstall-root.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/6/artifact/yetus-jdk17-hadoop3-check/output/patch-mvninstall-root.txt)
 |  root in the patch failed.  |
   | -1 :x: |  compile  |   0m 46s | 
[/patch-compile-hbase-server.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/6/artifact/yetus-jdk17-hadoop3-check/output/patch-compile-hbase-server.txt)
 |  hbase-server in the patch failed.  |
   | -0 :warning: |  javac  |   0m 46s | 
[/patch-compile-hbase-server.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/6/artifact/yetus-jdk17-hadoop3-check/output/patch-compile-hbase-server.txt)
 |  hbase-server in the patch failed.  |
   | +1 :green_heart: |  javadoc  |   0m 21s |  |  the patch passed  |
   | -1 :x: |  shadedjars  |   3m 18s |  |  patch has 16 errors when building 
our shaded downstream artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   0m 46s | 
[/patch-unit-hbase-server.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/6/artifact/yetus-jdk17-hadoop3-check/output/patch-unit-hbase-server.txt)
 |  hbase-server in the patch failed.  |
   |  |   |  15m 39s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.53 ServerAPI=1.53 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/6/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/7617 |
   | Optional Tests | javac javadoc unit compile shadedjars |
   | uname | Linux 542437302e36 6.14.0-1018-aws #18~24.04.1-Ubuntu SMP Mon Nov 
24 19:46:27 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / f7e81880f2fb14a4277b9ce69ca597e82e3b7854 |
   | Default Java | Eclipse Adoptium-17.0.11+9 |
   | shadedjars | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/6/artifact/yetus-jdk17-hadoop3-check/output/patch-shadedjars.txt
 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/6/testReport/
 |
   | Max. process+thread count | 75 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/6/console 
|
   | versions | git=2.34.1 maven=3.9.8 |
   | Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-02-05 Thread via GitHub


ankitsol commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3853556817

   I have moved both size and time based configs to Replication Endpoint 
peerConfig and defaults in ReplicationSourceShipper
   
   I am now re-using ReplicationSourceShipper#shipEdits() retry mechanism for 
handling exceptions
   
   Please help me with the code review @Apache9 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-28 Thread via GitHub


ankitsol commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2739954159


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +247,41 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();

Review Comment:
   Sure, so does it make sense to handle this via Peer Configuration?
   
   I can add the timeout/size logic to ReplicationSourceShipper, but have the 
thresholds default to -1 (preserving existing behavior). For my custom 
endpoint, I’ll pass specific values via the Peer Config. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-28 Thread via GitHub


Apache9 commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2736942086


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -98,6 +103,14 @@ public final void run() {
 LOG.info("Running ReplicationSourceShipper Thread for wal group: {}", 
this.walGroupId);
 // Loop until we close down
 while (isActive()) {
+  // Whether to persist replication offsets based on size/time thresholds
+  if (shouldPersistLogPosition()) {
+try {
+  persistLogPosition();
+} catch (IOException e) {
+  LOG.warn("Exception while persisting replication state", e);

Review Comment:
   In persistLogPosition, we will call the callback method in 
ReplicationEndpoint, and for S3 based replication endpoint, you will close the 
S3 file right? This operation could throw exceptions?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-28 Thread via GitHub


Apache9 commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2736930436


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +247,41 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();
+long maxBufferSize = endpoint.getMaxBufferSize();
+if (stagedWalSize == 0 || lastShippedBatch == null) {
+  return false;
+}
+if (maxBufferSize == -1) {
+  return true;
+}
+return stagedWalSize >= maxBufferSize
+  || (EnvironmentEdgeManager.currentTime() - lastStagedFlushTs >= 
endpoint.maxFlushInterval());
+  }
+
+  private void persistLogPosition() throws IOException {
+if (lastShippedBatch == null) {

Review Comment:
   Ah, OK, you do not rese the lastShippedBatch when reading a new batch. But 
it still makes me a bit nervous that how can we get here when lastShippedBatch 
is null...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-28 Thread via GitHub


Apache9 commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2736924354


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +247,41 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();

Review Comment:
   We only need this configuration in shipper, as persisting the log position 
is the duty of shipper, not endpoint. And there is no correctness problem if 
you implement the endpoint correctly since when persisting the log position, 
you will flush the files in endpoint. The problem is that the performance may 
be bad. So when using different replication endpoint implementation, you shoud 
tune the configuration in shipper about the persistency interval.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-28 Thread via GitHub


ankitsol commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2736370914


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -98,6 +103,14 @@ public final void run() {
 LOG.info("Running ReplicationSourceShipper Thread for wal group: {}", 
this.walGroupId);
 // Loop until we close down
 while (isActive()) {
+  // Whether to persist replication offsets based on size/time thresholds
+  if (shouldPersistLogPosition()) {
+try {
+  persistLogPosition();
+} catch (IOException e) {
+  LOG.warn("Exception while persisting replication state", e);

Review Comment:
   persistLogPosition() throws Exception only from cleanUpHFileRefs() where 
getBulkLoadDescriptor() can throw exception, hence I thought retry is not 
necessary. I am thinking to keep this as it is, but move cleanUpHFileRefs() 
towards the end of persistLogPosition() definition (after we update offset). 
Please let me know your thoughts



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-28 Thread via GitHub


ankitsol commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2736351890


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +247,41 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();

Review Comment:
   Ok, so we need to have these conf (time and size ones) at both 
(ReplicationSourceShipper and ReplicationEndpoint) ends, so we can achieve the 
tuning you mention in your previous comment: 
https://github.com/apache/hbase/pull/7617#discussion_r2719322657
   
   > For me, I do not think we need to expose this information to shipper?
   > 
   > The design here is that, when using different ReplicationEndpoint, you 
need to tune the shipper configuration by your own, as the parameters are not 
only affected by ReplicationEndpoint, they also depend on the shipper side.
   > 
   > For example, when you want to reduce the pressure on recording the offset, 
you should increase the record interval, i.e, increase batch size, increase the 
number of ship times between recording offset, etc. And if you want to reduce 
the pressure on memory and the target receiver, you should decrease the batch 
size, and for S3 based replication endpoint, there is also a trade off, if you 
increase the flush interval, you can get better performance and less files on 
S3, but failover will be more complicated as you need to start from long before.
   > 
   > So, this should be in the documentation, just exposing some configuration 
from ReplicationEndpoint can not handle all the above situations.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-28 Thread via GitHub


ankitsol commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2736357408


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +247,41 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();

Review Comment:
   @Apache9 Please let me know if my understanding is correct here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-28 Thread via GitHub


ankitsol commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2736351890


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +247,41 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();

Review Comment:
   Ok, so we need to have these conf (time and size ones) at both 
(ReplicationSourceShipper and ReplicationEndpoint) ends, so we can achieve the 
tuning you mention in your previous comment: 
https://github.com/apache/hbase/pull/7617#discussion_r2719322657
   
   `The design here is that, when using different ReplicationEndpoint, you need 
to tune the shipper configuration by your own, as the parameters are not only 
affected by ReplicationEndpoint, they also depend on the shipper side.
   
   For example, when you want to reduce the pressure on recording the offset, 
you should increase the record interval, i.e, increase batch size, increase the 
number of ship times between recording offset, etc. And if you want to reduce 
the pressure on memory and the target receiver, you should decrease the batch 
size, and for S3 based replication endpoint, there is also a trade off, if you 
increase the flush interval, you can get better performance and less files on 
S3, but failover will be more complicated as you need to start from long 
before.`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-28 Thread via GitHub


ankitsol commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2736314130


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +247,41 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();
+long maxBufferSize = endpoint.getMaxBufferSize();
+if (stagedWalSize == 0 || lastShippedBatch == null) {
+  return false;
+}
+if (maxBufferSize == -1) {
+  return true;
+}
+return stagedWalSize >= maxBufferSize
+  || (EnvironmentEdgeManager.currentTime() - lastStagedFlushTs >= 
endpoint.maxFlushInterval());
+  }
+
+  private void persistLogPosition() throws IOException {
+if (lastShippedBatch == null) {

Review Comment:
   As far as I understand, lastShippedBatch 'null' means no batch has been 
replicated yet, so we don't need to update offset. Please correct me if I am 
wrong here
   
   lastShippedBatch is by default 'null' during ReplicationSourceShipper 
initialisation and as soon as a batch is replicated it is updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-28 Thread via GitHub


wchevreuil commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2736093265


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationEndpoint.java:
##
@@ -283,4 +283,35 @@ public int getTimeout() {
* @throws IllegalStateException if this service's state isn't FAILED.
*/
   Throwable failureCause();
+
+  /**
+   * @return true if this endpoint buffers WAL entries and requires explicit 
flush control before
+   * persisting replication offsets.
+   */
+  default boolean isBufferedReplicationEndpoint() {
+return false;
+  }
+
+  /**
+   * Maximum WAL size (bytes) to buffer before forcing a flush. Only 
meaningful when
+   * isBufferedReplicationEndpoint() == true.
+   */
+  default long getMaxBufferSize() {
+return -1L;
+  }
+
+  /**
+   * Maximum time (ms) to wait before forcing a flush. Only meaningful when
+   * isBufferedReplicationEndpoint() == true.
+   */
+  default long maxFlushInterval() {
+return Long.MAX_VALUE;
+  }
+
+  /**
+   * Hook invoked before persisting replication offsets. Buffered endpoints 
should flush/close WALs
+   * here.
+   */
+  default void beforePersistingReplicationOffset() {

Review Comment:
   Ok, so maybe call it `beforeUpdatingLogPosition`? And let's be more detailed 
in the javadoc comments, like saying this is called by the shipper when it's 
about to move the offset forward in the wal reader and potentially make wal 
files that were fully read eligible for deletion. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-28 Thread via GitHub


wchevreuil commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2736079118


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +247,41 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();
+long maxBufferSize = endpoint.getMaxBufferSize();
+if (stagedWalSize == 0 || lastShippedBatch == null) {
+  return false;
+}
+if (maxBufferSize == -1) {
+  return true;
+}
+return stagedWalSize >= maxBufferSize
+  || (EnvironmentEdgeManager.currentTime() - lastStagedFlushTs >= 
endpoint.maxFlushInterval());
+  }
+
+  private void persistLogPosition() throws IOException {
+if (lastShippedBatch == null) {
+  return;
+}
+
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();
+endpoint.beforePersistingReplicationOffset();
+
+// Clean up hfile references
+for (Entry entry : entriesForCleanUpHFileRefs) {
+  cleanUpHFileRefs(entry.getEdit());
+  LOG.trace("shipped entry {}: ", entry);

Review Comment:
   Please provide a more accurate message. This is not what really happened 
here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-28 Thread via GitHub


Apache9 commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2736040903


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +247,41 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();

Review Comment:
   Here we should use configuration values instead of getting from 
ReplicationEndpoint. We can have default configuration values to keep the old 
behavior for normal replication.



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +247,41 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();
+long maxBufferSize = endpoint.getMaxBufferSize();
+if (stagedWalSize == 0 || lastShippedBatch == null) {
+  return false;
+}
+if (maxBufferSize == -1) {
+  return true;
+}
+return stagedWalSize >= maxBufferSize
+  || (EnvironmentEdgeManager.currentTime() - lastStagedFlushTs >= 
endpoint.maxFlushInterval());
+  }
+
+  private void persistLogPosition() throws IOException {
+if (lastShippedBatch == null) {

Review Comment:
   Since we could cumulate different batches in the above loop, a null batch 
does not mean we haven't shipped anything out? Why here we just return if 
lastShippedBatch is null?



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -98,6 +103,14 @@ public final void run() {
 LOG.info("Running ReplicationSourceShipper Thread for wal group: {}", 
this.walGroupId);
 // Loop until we close down
 while (isActive()) {
+  // Whether to persist replication offsets based on size/time thresholds
+  if (shouldPersistLogPosition()) {
+try {
+  persistLogPosition();
+} catch (IOException e) {
+  LOG.warn("Exception while persisting replication state", e);

Review Comment:
   This is not enough for handling the exception? Typically we should restart 
from the last persistent offset and replicate again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-28 Thread via GitHub


wchevreuil commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2735996225


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -190,13 +204,16 @@ private void shipEdits(WALEntryBatch entryBatch) {
 } else {
   sleepMultiplier = Math.max(sleepMultiplier - 1, 0);
 }
-// Clean up hfile references
-for (Entry entry : entries) {
-  cleanUpHFileRefs(entry.getEdit());
-  LOG.trace("shipped entry {}: ", entry);
+
+stagedWalSize += currentSize;
+entriesForCleanUpHFileRefs.addAll(entries);
+lastShippedBatch = entryBatch;
+if (shouldPersistLogPosition()) {

Review Comment:
   > Are you referring to your original comment about passing the entire 
entryBatch object?
   
   No, I meant [this check 
](https://github.com/apache/hbase/pull/7617/changes/8c828290cc7ab36b9db75a72916d6bbfd1dc4e47#diff-1907878c880a164eafc9eb5fb412a376d1d504bf54bfe91192e36693e60dfaf8L251)that
 was being done inside "shouldPersistLogPosition", which would cause the buffer 
size to be only considered for specific endpoint types. 
   
   @ankitsol has already addressed it on a recent commit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-27 Thread via GitHub


Apache-HBase commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3807670294

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 30s |  |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --author-ignore-list 
--blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck  |
    _ Prechecks _ |
    _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 33s |  |  master passed  |
   | +1 :green_heart: |  compile  |   1m  7s |  |  master passed  |
   | +1 :green_heart: |  javadoc  |   0m 31s |  |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 58s |  |  branch has no errors when 
building our shaded downstream artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m  6s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 59s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 59s |  |  the patch passed  |
   | -0 :warning: |  javadoc  |   0m 27s | 
[/results-javadoc-javadoc-hbase-server.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/5/artifact/yetus-jdk17-hadoop3-check/output/results-javadoc-javadoc-hbase-server.txt)
 |  hbase-server generated 1 new + 65 unchanged - 0 fixed = 66 total (was 65)  |
   | +1 :green_heart: |  shadedjars  |   5m 52s |  |  patch has no errors when 
building our shaded downstream artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 230m 20s |  |  hbase-server in the patch 
passed.  |
   |  |   | 257m 36s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/5/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/7617 |
   | Optional Tests | javac javadoc unit compile shadedjars |
   | uname | Linux a908a4083825 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 
23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / e74eee1698d3bff65b1d1b05e4b1982fe450c72b |
   | Default Java | Eclipse Adoptium-17.0.11+9 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/5/testReport/
 |
   | Max. process+thread count | 4270 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/5/console 
|
   | versions | git=2.34.1 maven=3.9.8 |
   | Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-27 Thread via GitHub


Apache-HBase commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3806617854

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 11s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  |  Patch does not have any 
anti-patterns.  |
    _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 15s |  |  master passed  |
   | +1 :green_heart: |  compile  |   2m 35s |  |  master passed  |
   | +1 :green_heart: |  checkstyle  |   0m 45s |  |  master passed  |
   | +1 :green_heart: |  spotbugs  |   1m 13s |  |  master passed  |
   | +1 :green_heart: |  spotless  |   0m 34s |  |  branch has no errors when 
running spotless:check.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 31s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 31s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  hadoopcheck  |   8m 30s |  |  Patch does not cause any 
errors with Hadoop 3.3.6 3.4.1.  |
   | +1 :green_heart: |  spotless  |   0m 34s |  |  patch has no errors when 
running spotless:check.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m  8s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  29m  3s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.53 ServerAPI=1.53 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/5/artifact/yetus-general-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/7617 |
   | Optional Tests | dupname asflicense javac spotbugs checkstyle codespell 
detsecrets compile hadoopcheck hbaseanti spotless |
   | uname | Linux 3133a65ee296 6.14.0-1018-aws #18~24.04.1-Ubuntu SMP Mon Nov 
24 19:46:27 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / e74eee1698d3bff65b1d1b05e4b1982fe450c72b |
   | Default Java | Eclipse Adoptium-17.0.11+9 |
   | Max. process+thread count | 85 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/5/console 
|
   | versions | git=2.34.1 maven=3.9.8 spotbugs=4.7.3 |
   | Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-27 Thread via GitHub


anmolnar commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2733050407


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +246,53 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();
+if (!endpoint.isBufferedReplicationEndpoint()) {
+  // Non-buffering endpoints persist immediately
+  return true;
+}
+if (stagedWalSize == 0 || lastShippedBatch == null) {
+  return false;
+}
+return stagedWalSize >= endpoint.getMaxBufferSize()
+  || (EnvironmentEdgeManager.currentTime() - lastStagedFlushTs >= 
endpoint.maxFlushInterval());
+  }
+
+  @Nullable
+  // Returns IOException instead of throwing so callers can decide
+  // whether to retry (shipEdits) or best-effort log (run()).
+  private IOException persistLogPosition() {
+if (lastShippedBatch == null) {
+  return null;
+}
+
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();
+
+if (endpoint.isBufferedReplicationEndpoint() && stagedWalSize > 0) {
+  source.getReplicationEndpoint().beforePersistingReplicationOffset();
+}

Review Comment:
   I think Duo's 
[comment](https://github.com/apache/hbase/pull/7617#discussion_r2721491163) 
applies here as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-27 Thread via GitHub


anmolnar commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2733047546


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -190,13 +204,16 @@ private void shipEdits(WALEntryBatch entryBatch) {
 } else {
   sleepMultiplier = Math.max(sleepMultiplier - 1, 0);
 }
-// Clean up hfile references
-for (Entry entry : entries) {
-  cleanUpHFileRefs(entry.getEdit());
-  LOG.trace("shipped entry {}: ", entry);
+
+stagedWalSize += currentSize;
+entriesForCleanUpHFileRefs.addAll(entries);
+lastShippedBatch = entryBatch;
+if (shouldPersistLogPosition()) {

Review Comment:
   > That would be fine for me, but then we would need to adjust the 
shouldPersistLogPosition method accordingly.
   
   Are you referring to your original comment about passing the entire 
`entryBatch` object?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-27 Thread via GitHub


Apache-HBase commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3805694243

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 11s |  |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --author-ignore-list 
--blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck  |
    _ Prechecks _ |
    _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 23s |  |  master passed  |
   | +1 :green_heart: |  compile  |   0m 48s |  |  master passed  |
   | +1 :green_heart: |  javadoc  |   0m 21s |  |  master passed  |
   | +1 :green_heart: |  shadedjars  |   4m 25s |  |  branch has no errors when 
building our shaded downstream artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 46s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 46s |  |  the patch passed  |
   | -0 :warning: |  javadoc  |   0m 21s | 
[/results-javadoc-javadoc-hbase-server.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/4/artifact/yetus-jdk17-hadoop3-check/output/results-javadoc-javadoc-hbase-server.txt)
 |  hbase-server generated 1 new + 65 unchanged - 0 fixed = 66 total (was 65)  |
   | +1 :green_heart: |  shadedjars  |   4m 25s |  |  patch has no errors when 
building our shaded downstream artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 216m  8s | 
[/patch-unit-hbase-server.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/4/artifact/yetus-jdk17-hadoop3-check/output/patch-unit-hbase-server.txt)
 |  hbase-server in the patch failed.  |
   |  |   | 236m  1s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.52 ServerAPI=1.52 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/4/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/7617 |
   | Optional Tests | javac javadoc unit compile shadedjars |
   | uname | Linux 127c6853cf95 6.14.0-1018-aws #18~24.04.1-Ubuntu SMP Mon Nov 
24 19:46:27 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 8c828290cc7ab36b9db75a72916d6bbfd1dc4e47 |
   | Default Java | Eclipse Adoptium-17.0.11+9 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/4/testReport/
 |
   | Max. process+thread count | 6323 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/4/console 
|
   | versions | git=2.34.1 maven=3.9.8 |
   | Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-27 Thread via GitHub


Apache-HBase commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3804659205

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 11s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  |  Patch does not have any 
anti-patterns.  |
    _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m  8s |  |  master passed  |
   | +1 :green_heart: |  compile  |   2m 52s |  |  master passed  |
   | +1 :green_heart: |  checkstyle  |   0m 53s |  |  master passed  |
   | +1 :green_heart: |  spotbugs  |   1m 26s |  |  master passed  |
   | +1 :green_heart: |  spotless  |   0m 43s |  |  branch has no errors when 
running spotless:check.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 49s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 47s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 47s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 51s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |   1m 30s |  |  the patch passed  |
   | +1 :green_heart: |  hadoopcheck  |   9m 26s |  |  Patch does not cause any 
errors with Hadoop 3.3.6 3.4.1.  |
   | -1 :x: |  spotless  |   0m 32s | 
[/patch-spotless.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/4/artifact/yetus-general-check/output/patch-spotless.txt)
 |  patch has 24 errors when running spotless:check, run spotless:apply to fix. 
 |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 10s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  33m  9s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.53 ServerAPI=1.53 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/4/artifact/yetus-general-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/7617 |
   | Optional Tests | dupname asflicense javac spotbugs checkstyle codespell 
detsecrets compile hadoopcheck hbaseanti spotless |
   | uname | Linux 11fb3d023937 6.14.0-1018-aws #18~24.04.1-Ubuntu SMP Mon Nov 
24 19:46:27 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 8c828290cc7ab36b9db75a72916d6bbfd1dc4e47 |
   | Default Java | Eclipse Adoptium-17.0.11+9 |
   | Max. process+thread count | 84 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/4/console 
|
   | versions | git=2.34.1 maven=3.9.8 spotbugs=4.7.3 |
   | Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-26 Thread via GitHub


wchevreuil commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2727222631


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -190,13 +204,16 @@ private void shipEdits(WALEntryBatch entryBatch) {
 } else {
   sleepMultiplier = Math.max(sleepMultiplier - 1, 0);
 }
-// Clean up hfile references
-for (Entry entry : entries) {
-  cleanUpHFileRefs(entry.getEdit());
-  LOG.trace("shipped entry {}: ", entry);
+
+stagedWalSize += currentSize;
+entriesForCleanUpHFileRefs.addAll(entries);
+lastShippedBatch = entryBatch;
+if (shouldPersistLogPosition()) {

Review Comment:
   Oh, so your idea is to allow shipper to decide persist log position based on 
time and/or stg usage by wals regarldess of the endpoint implementation? That 
would be fine for me, but then we would need to adjust the 
shouldPersistLogPosition method accordingly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-24 Thread via GitHub


Apache9 commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2724276907


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -190,13 +204,16 @@ private void shipEdits(WALEntryBatch entryBatch) {
 } else {
   sleepMultiplier = Math.max(sleepMultiplier - 1, 0);
 }
-// Clean up hfile references
-for (Entry entry : entries) {
-  cleanUpHFileRefs(entry.getEdit());
-  LOG.trace("shipped entry {}: ", entry);
+
+stagedWalSize += currentSize;
+entriesForCleanUpHFileRefs.addAll(entries);
+lastShippedBatch = entryBatch;
+if (shouldPersistLogPosition()) {

Review Comment:
   I think time based and size based persistency is enough for most cases? If 
in the future we have some special endpoint which needs new type of decision 
way, we can add new mechanism, no problem.
   
   The problem here why we do not want to only trigger persistency from 
endpoint is that, we have other considerations about when to persist the log 
position, like the trade off between failover and pressure on replication 
storage. So here I suggest that we introduce general mechanisms to control the 
behavior of persistency of log position, users can tune it based on different 
approach.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-23 Thread via GitHub


wchevreuil commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2721937683


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -190,13 +204,16 @@ private void shipEdits(WALEntryBatch entryBatch) {
 } else {
   sleepMultiplier = Math.max(sleepMultiplier - 1, 0);
 }
-// Clean up hfile references
-for (Entry entry : entries) {
-  cleanUpHFileRefs(entry.getEdit());
-  LOG.trace("shipped entry {}: ", entry);
+
+stagedWalSize += currentSize;
+entriesForCleanUpHFileRefs.addAll(entries);
+lastShippedBatch = entryBatch;
+if (shouldPersistLogPosition()) {

Review Comment:
   IMHO, it doesn't look much cohesive. Shipper seems to be taking decisions 
based on specific endpoint implementations. What if new endpoint impls with 
different logic for updating log position are thought in the future, we would 
need to revisit the shipper again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-23 Thread via GitHub


Apache9 commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2721491163


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -190,13 +204,16 @@ private void shipEdits(WALEntryBatch entryBatch) {
 } else {
   sleepMultiplier = Math.max(sleepMultiplier - 1, 0);
 }
-// Clean up hfile references
-for (Entry entry : entries) {
-  cleanUpHFileRefs(entry.getEdit());
-  LOG.trace("shipped entry {}: ", entry);
+
+stagedWalSize += currentSize;
+entriesForCleanUpHFileRefs.addAll(entries);
+lastShippedBatch = entryBatch;
+if (shouldPersistLogPosition()) {

Review Comment:
   We want to determine whether we need to persist the log position in shipper, 
based on some configurations, not triggered by replication endpoint. Users can 
choose different configuration values based on different replication endpoint 
implementations.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-23 Thread via GitHub


Apache9 commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2721486401


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +246,53 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();
+if (!endpoint.isBufferedReplicationEndpoint()) {
+  // Non-buffering endpoints persist immediately
+  return true;
+}
+if (stagedWalSize == 0 || lastShippedBatch == null) {
+  return false;
+}
+return stagedWalSize >= endpoint.getMaxBufferSize()
+  || (EnvironmentEdgeManager.currentTime() - lastStagedFlushTs >= 
endpoint.maxFlushInterval());
+  }

Review Comment:
   Please see my comment above to get more context
   
   > For me, I do not think we need to expose this information to shipper?
   >
   >The design here is that, when using different ReplicationEndpoint, you need 
to tune the shipper configuration by your own, as the parameters are not only 
affected by ReplicationEndpoint, they also depend on the shipper side.
   >
   >For example, when you want to reduce the pressure on recording the offset, 
you should increase the record interval, i.e, increase batch size, increase the 
number of ship times between recording offset, etc. And if you want to reduce 
the pressure on memory and the target receiver, you should decrease the batch 
size, and for S3 based replication endpoint, there is also a trade off, if you 
increase the flush interval, you can get better performance and less files on 
S3, but failover will be more complicated as you need to start from long before.
   >
   >So, this should be in the documentation, just exposing some configuration 
from ReplicationEndpoint can not handle all the above situations.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-23 Thread via GitHub


Apache9 commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2721481232


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationEndpoint.java:
##
@@ -283,4 +283,35 @@ public int getTimeout() {
* @throws IllegalStateException if this service's state isn't FAILED.
*/
   Throwable failureCause();
+
+  /**
+   * @return true if this endpoint buffers WAL entries and requires explicit 
flush control before
+   * persisting replication offsets.
+   */
+  default boolean isBufferedReplicationEndpoint() {
+return false;
+  }
+
+  /**
+   * Maximum WAL size (bytes) to buffer before forcing a flush. Only 
meaningful when
+   * isBufferedReplicationEndpoint() == true.
+   */
+  default long getMaxBufferSize() {
+return -1L;
+  }
+
+  /**
+   * Maximum time (ms) to wait before forcing a flush. Only meaningful when
+   * isBufferedReplicationEndpoint() == true.
+   */
+  default long maxFlushInterval() {
+return Long.MAX_VALUE;
+  }
+
+  /**
+   * Hook invoked before persisting replication offsets. Buffered endpoints 
should flush/close WALs
+   * here.
+   */
+  default void beforePersistingReplicationOffset() {

Review Comment:
   Sorry, this is exactly what we want to avoid here. This method just tells 
the endpoint what the shipper going to do, and the endpoint can do anything it 
wants. For our normal replication framework, there is no flush and close 
operations, as we will always send everything out and return until we get acks, 
so basically we do not need to implement this method. And for S3 based 
replication endpoint, we need to close the file to persist it on S3. Maybe in 
the future we have other types of replication endpoint which may do other 
works, so we do not want to name it `flushAndCloseWAL`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-23 Thread via GitHub


wchevreuil commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2720780067


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +246,53 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();
+if (!endpoint.isBufferedReplicationEndpoint()) {
+  // Non-buffering endpoints persist immediately
+  return true;
+}
+if (stagedWalSize == 0 || lastShippedBatch == null) {
+  return false;
+}
+return stagedWalSize >= endpoint.getMaxBufferSize()
+  || (EnvironmentEdgeManager.currentTime() - lastStagedFlushTs >= 
endpoint.maxFlushInterval());
+  }
+
+  @Nullable
+  // Returns IOException instead of throwing so callers can decide
+  // whether to retry (shipEdits) or best-effort log (run()).
+  private IOException persistLogPosition() {
+if (lastShippedBatch == null) {
+  return null;
+}
+
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();
+
+if (endpoint.isBufferedReplicationEndpoint() && stagedWalSize > 0) {
+  source.getReplicationEndpoint().beforePersistingReplicationOffset();
+}

Review Comment:
   Also should be in the endpoint.



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +246,53 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();
+if (!endpoint.isBufferedReplicationEndpoint()) {
+  // Non-buffering endpoints persist immediately
+  return true;
+}
+if (stagedWalSize == 0 || lastShippedBatch == null) {
+  return false;
+}
+return stagedWalSize >= endpoint.getMaxBufferSize()
+  || (EnvironmentEdgeManager.currentTime() - lastStagedFlushTs >= 
endpoint.maxFlushInterval());
+  }

Review Comment:
   This should be in the Endpoint, as the decision varies according to the type 
of endpoint. Today we have basically two types, buffered and non-buffered. If 
we have new endpoint types in the future, we might again need to come here and 
add the related logic.



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationEndpoint.java:
##
@@ -283,4 +283,35 @@ public int getTimeout() {
* @throws IllegalStateException if this service's state isn't FAILED.
*/
   Throwable failureCause();
+
+  /**
+   * @return true if this endpoint buffers WAL entries and requires explicit 
flush control before
+   * persisting replication offsets.
+   */
+  default boolean isBufferedReplicationEndpoint() {
+return false;
+  }
+
+  /**
+   * Maximum WAL size (bytes) to buffer before forcing a flush. Only 
meaningful when
+   * isBufferedReplicationEndpoint() == true.
+   */
+  default long getMaxBufferSize() {
+return -1L;
+  }
+
+  /**
+   * Maximum time (ms) to wait before forcing a flush. Only meaningful when
+   * isBufferedReplicationEndpoint() == true.
+   */
+  default long maxFlushInterval() {
+return Long.MAX_VALUE;
+  }
+
+  /**
+   * Hook invoked before persisting replication offsets. Buffered endpoints 
should flush/close WALs
+   * here.
+   */
+  default void beforePersistingReplicationOffset() {

Review Comment:
   From the endpoints view, this method is just a flush/close of the given wal, 
so lets name it accordingly.
   ```suggestion
 default void flushAndCloseWAL() {
   ```



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -190,13 +204,16 @@ private void shipEdits(WALEntryBatch entryBatch) {
 } else {
   sleepMultiplier = Math.max(sleepMultiplier - 1, 0);
 }
-// Clean up hfile references
-for (Entry entry : entries) {
-  cleanUpHFileRefs(entry.getEdit());
-  LOG.trace("shipped entry {}: ", entry);
+
+stagedWalSize += currentSize;
+entriesForCleanUpHFileRefs.addAll(entries);
+lastShippedBatch = entryBatch;
+if (shouldPersistLogPosition()) {

Review Comment:
   Rather than having these `stagedWalSize` and `lastShippedBatch` as global 
variables, we should just pass the `entryBatch` along to 
`shouldPersistLogPosition()` (which should be defined/implemented in the 
endpoints, btw, see my other comment related) and `persistLogPosition()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-22 Thread via GitHub


Apache9 commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2719322657


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationEndpoint.java:
##
@@ -283,4 +283,35 @@ public int getTimeout() {
* @throws IllegalStateException if this service's state isn't FAILED.
*/
   Throwable failureCause();
+
+  /**
+   * @return true if this endpoint buffers WAL entries and requires explicit 
flush control before
+   * persisting replication offsets.
+   */
+  default boolean isBufferedReplicationEndpoint() {

Review Comment:
   For me, I do not think we need to expose this information to shipper?
   
   The design here is that, when using different ReplicationEndpoint, you need 
to tune the shipper configuration by your own, as the parameters are not only 
affected by ReplicationEndpoint, they also depend on the shipper side.
   
   For example, when you want to reduce the pressure on recording the offset, 
you should increase the record interval, i.e, increase batch size, increase the 
number of ship times between recording offset, etc. And if you want to reduce 
the pressure on memory and the target receiver, you should decrease the batch 
size, and for S3 based replication endpoint, there is also a trade off, if you 
increase the flush interval, you can get better performance and less files on 
S3, but failover will be more complicated as you need to start from long before.
   
   So, this should be in the documentation, just exposing some configuration 
from ReplicationEndpoint can not handle all the above situations.



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -98,6 +104,13 @@ public final void run() {
 LOG.info("Running ReplicationSourceShipper Thread for wal group: {}", 
this.walGroupId);
 // Loop until we close down
 while (isActive()) {
+  // check if flush needed for WAL backup, this is need for timeout based 
flush

Review Comment:
   This is not designed for WAL backup only, I need to say. Here, in shipper, 
we just follow the configured limit, i.e, time based or size based, to 
determine whether we need to record the log position, and there is a callback 
to ReplicationEndpoint before recording, the replication endpoint can use this 
callback to do something, the shipper does not know the detail.



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +246,53 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();
+if (!endpoint.isBufferedReplicationEndpoint()) {
+  // Non-buffering endpoints persist immediately
+  return true;
+}
+if (stagedWalSize == 0 || lastShippedBatch == null) {
+  return false;
+}
+return stagedWalSize >= endpoint.getMaxBufferSize()
+  || (EnvironmentEdgeManager.currentTime() - lastStagedFlushTs >= 
endpoint.maxFlushInterval());
+  }
+
+  @Nullable
+  // Returns IOException instead of throwing so callers can decide
+  // whether to retry (shipEdits) or best-effort log (run()).
+  private IOException persistLogPosition() {

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-22 Thread via GitHub


taklwu commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2718075417


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +246,53 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();
+if (!endpoint.isBufferedReplicationEndpoint()) {
+  // Non-buffering endpoints persist immediately
+  return true;
+}
+if (stagedWalSize == 0 || lastShippedBatch == null) {
+  return false;
+}
+return stagedWalSize >= endpoint.getMaxBufferSize()
+  || (EnvironmentEdgeManager.currentTime() - lastStagedFlushTs >= 
endpoint.maxFlushInterval());
+  }
+
+  @Nullable
+  // Returns IOException instead of throwing so callers can decide
+  // whether to retry (shipEdits) or best-effort log (run()).
+  private IOException persistLogPosition() {
+if (lastShippedBatch == null) {
+  return null;
+}
+
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();
+
+if (endpoint.isBufferedReplicationEndpoint() && stagedWalSize > 0) {
+  source.getReplicationEndpoint().beforePersistingReplicationOffset();
+}
+
+stagedWalSize = 0;
+lastStagedFlushTs = EnvironmentEdgeManager.currentTime();
+
+// Clean up hfile references
+try {
+  for (Entry entry : entriesForCleanUpHFileRefs) {
+cleanUpHFileRefs(entry.getEdit());
+LOG.trace("shipped entry {}: ", entry);
+  }
+} catch (IOException e) {
+  LOG.warn("{} threw exception while cleaning up hfile refs", 
endpoint.getClass().getName(), e);
+  return e;

Review Comment:
   ```suggestion
 throw e;
   ```



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -98,6 +104,13 @@ public final void run() {
 LOG.info("Running ReplicationSourceShipper Thread for wal group: {}", 
this.walGroupId);
 // Loop until we close down
 while (isActive()) {
+  // check if flush needed for WAL backup, this is need for timeout based 
flush
+  if (shouldPersistLogPosition()) {
+IOException error = persistLogPosition();
+if (error != null) {
+  LOG.warn("Exception while persisting replication state", error);
+}
+  }

Review Comment:
   nit: I don't think you need to change the return type as `IOException`, 
instead you can just use `try-catch` if it's really needed to handle any 
Exception differently. 
   
   ```suggestion
 if (shouldPersistLogPosition()) {
   persistLogPosition();
 }
   ```



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +246,53 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+ReplicationEndpoint endpoint = source.getReplicationEndpoint();
+if (!endpoint.isBufferedReplicationEndpoint()) {
+  // Non-buffering endpoints persist immediately
+  return true;
+}
+if (stagedWalSize == 0 || lastShippedBatch == null) {
+  return false;
+}
+return stagedWalSize >= endpoint.getMaxBufferSize()
+  || (EnvironmentEdgeManager.currentTime() - lastStagedFlushTs >= 
endpoint.maxFlushInterval());
+  }
+
+  @Nullable
+  // Returns IOException instead of throwing so callers can decide
+  // whether to retry (shipEdits) or best-effort log (run()).
+  private IOException persistLogPosition() {

Review Comment:
   ```suggestion
 private void persistLogPosition() throws IOException {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-22 Thread via GitHub


Apache-HBase commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3786077391

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 29s |  |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --author-ignore-list 
--blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck  |
    _ Prechecks _ |
    _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 27s |  |  master passed  |
   | +1 :green_heart: |  compile  |   1m  6s |  |  master passed  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 55s |  |  branch has no errors when 
building our shaded downstream artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m  8s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  5s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m  5s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 51s |  |  patch has no errors when 
building our shaded downstream artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 241m 59s |  |  hbase-server in the patch 
passed.  |
   |  |   | 269m 23s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/3/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/7617 |
   | Optional Tests | javac javadoc unit compile shadedjars |
   | uname | Linux d4d0c9ba21ef 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 
23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / daded968719a4ba7ae0af58cfc0b2b1083f35ad7 |
   | Default Java | Eclipse Adoptium-17.0.11+9 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/3/testReport/
 |
   | Max. process+thread count | 3736 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/3/console 
|
   | versions | git=2.34.1 maven=3.9.8 |
   | Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-21 Thread via GitHub


anmolnar commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3781194007

   @wchevreuil @Apache9 Thank you guys for the suggestion. I'm happy to merge 
it either branch once it's accepted.
   
   Please also provide feedback on the patch itself. Does it match the approach 
that you suggested on the feature branch @Apache9 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-21 Thread via GitHub


Apache9 commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3777600615

   > Why are we targeting this to master? Shouldn't it be on the feature branch?
   > 
   > Also, as already mentioned by @anmolnar and @taklwu , we should refrain 
from adding logic that is specific to the continuous backup replication in the 
generic replication interfaces/classes.
   
   This is my suggestion that we should target to master branch to add this 
feature first, and then reimplement the continuous backup feature on top of 
this change. Of course, we do not need to merge this to master first, this is 
just for better reviewing, we can land this to a feature branch and then rebase 
the continus backup branch.
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-20 Thread via GitHub


anmolnar commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2709926770


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +239,41 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+if (stagedWalSize == 0 || lastShippedBatch == null) {
+  return false;
+}
+return (stagedWalSize >= 
source.getReplicationEndpoint().getMaxBufferSize())

Review Comment:
   I can agree with this comment too. You introduced two new properties: 
`getMaxBufferSize()` and `maxFlushInterval()` with default values which should 
mimic the original behavior. I think you documented it in `ReplicationEndpoint` 
interface properly and it would be useful to explicitly check for it too.



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationEndpoint.java:
##
@@ -283,4 +283,29 @@ public int getTimeout() {
* @throws IllegalStateException if this service's state isn't FAILED.
*/
   Throwable failureCause();
+
+  // WAL entries are buffered in ContinuousBackupReplicationEndpoint before 
flushing to WAL backup
+  // file. So we return config value CONF_BACKUP_MAX_WAL_SIZE for
+  // ContinuousBackupReplicationEndpoint
+  // and -1 for other ReplicationEndpoint since they don't buffer.
+  // For other ReplicationEndpoint, everytime a WALEntryBatch is shipped, we 
update replication
+  // offset. Please check ReplicationSourceShipper#shouldFlushStagedWal()

Review Comment:
   @ankitsol It seems that this comment is valid.



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +239,41 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+if (stagedWalSize == 0 || lastShippedBatch == null) {
+  return false;
+}
+return (stagedWalSize >= 
source.getReplicationEndpoint().getMaxBufferSize())
+  || (EnvironmentEdgeManager.currentTime() - lastStagedFlushTs
+  >= source.getReplicationEndpoint().maxFlushInterval());
+  }
+
+  private void persistLogPosition() {
+if (lastShippedBatch == null) {
+  return;
+}
+if (stagedWalSize > 0) {
+  source.getReplicationEndpoint().beforePersistingReplicationOffset();
+}
+stagedWalSize = 0;
+lastStagedFlushTs = EnvironmentEdgeManager.currentTime();
+
+// Clean up hfile references
+for (Entry entry : entriesForCleanUpHFileRefs) {
+  try {
+cleanUpHFileRefs(entry.getEdit());
+  } catch (IOException e) {
+LOG.warn("{} threw unknown exception:",
+  source.getReplicationEndpoint().getClass().getName(), e);
+  }

Review Comment:
   Agree. Either the exception should be rethrown here or don't need to catch 
at all.



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationEndpoint.java:
##
@@ -283,4 +283,29 @@ public int getTimeout() {
* @throws IllegalStateException if this service's state isn't FAILED.
*/
   Throwable failureCause();
+
+  // WAL entries are buffered in ContinuousBackupReplicationEndpoint before 
flushing to WAL backup
+  // file. So we return config value CONF_BACKUP_MAX_WAL_SIZE for
+  // ContinuousBackupReplicationEndpoint
+  // and -1 for other ReplicationEndpoint since they don't buffer.
+  // For other ReplicationEndpoint, everytime a WALEntryBatch is shipped, we 
update replication
+  // offset. Please check ReplicationSourceShipper#shouldFlushStagedWal()
+  default long getMaxBufferSize() {
+return -1;
+  }
+
+  // WAL entries are buffered in ContinuousBackupReplicationEndpoint before 
flushing to WAL backup
+  // file. So we return config value CONF_STAGED_WAL_FLUSH_INTERVAL for
+  // ContinuousBackupReplicationEndpoint
+  // and Long.MAX_VALUE for other ReplicationEndpoint since they don't buffer.
+  // For other ReplicationEndpoint, everytime a WALEntryBatch is shipped, we 
update replication
+  // offset. Please check ReplicationSourceShipper#shouldFlushStagedWal()

Review Comment:
   Same here.



##
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceShipperBufferedFlush.java:
##
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License

Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-20 Thread via GitHub


anmolnar commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2709831534


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationEndpoint.java:
##
@@ -283,4 +283,29 @@ public int getTimeout() {
* @throws IllegalStateException if this service's state isn't FAILED.
*/
   Throwable failureCause();
+
+  // WAL entries are buffered in ContinuousBackupReplicationEndpoint before 
flushing to WAL backup
+  // file. So we return config value CONF_BACKUP_MAX_WAL_SIZE for
+  // ContinuousBackupReplicationEndpoint
+  // and -1 for other ReplicationEndpoint since they don't buffer.
+  // For other ReplicationEndpoint, everytime a WALEntryBatch is shipped, we 
update replication
+  // offset. Please check ReplicationSourceShipper#shouldFlushStagedWal()
+  default long getMaxBufferSize() {
+return -1;
+  }
+
+  // WAL entries are buffered in ContinuousBackupReplicationEndpoint before 
flushing to WAL backup

Review Comment:
   @ankitsol  I agree. Please remove these references from this patch.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-20 Thread via GitHub


ankitsol commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3774098500

   Hi @Apache9 @anmolnar @vinayakphegde just a gentle reminder on this PR when 
you get a moment 🙂 Thanks! 
   This PR is  followup to https://github.com/apache/hbase/pull/7591


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-20 Thread via GitHub


ankitsol commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3772997503

   > can you also check if these two failures are related?
   > 
   > TestHRegionWithInMemoryFlush.testParallelIncrementWithMemStoreFlush 
TestTags.testFlushAndCompactionwithCombinations
   
   These are passing locally


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-12 Thread via GitHub


taklwu commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2684251451


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationEndpoint.java:
##
@@ -283,4 +283,29 @@ public int getTimeout() {
* @throws IllegalStateException if this service's state isn't FAILED.
*/
   Throwable failureCause();
+
+  // WAL entries are buffered in ContinuousBackupReplicationEndpoint before 
flushing to WAL backup
+  // file. So we return config value CONF_BACKUP_MAX_WAL_SIZE for
+  // ContinuousBackupReplicationEndpoint
+  // and -1 for other ReplicationEndpoint since they don't buffer.
+  // For other ReplicationEndpoint, everytime a WALEntryBatch is shipped, we 
update replication
+  // offset. Please check ReplicationSourceShipper#shouldFlushStagedWal()
+  default long getMaxBufferSize() {
+return -1;
+  }
+
+  // WAL entries are buffered in ContinuousBackupReplicationEndpoint before 
flushing to WAL backup

Review Comment:
   nit: `ContinuousBackupReplicationEndpoint` is part of 
https://github.com/apache/hbase/pull/7591/ and it's yet committing to master, 
should we mention early in this change? 



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +239,41 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+if (stagedWalSize == 0 || lastShippedBatch == null) {
+  return false;
+}
+return (stagedWalSize >= 
source.getReplicationEndpoint().getMaxBufferSize())
+  || (EnvironmentEdgeManager.currentTime() - lastStagedFlushTs
+  >= source.getReplicationEndpoint().maxFlushInterval());
+  }
+
+  private void persistLogPosition() {
+if (lastShippedBatch == null) {
+  return;
+}
+if (stagedWalSize > 0) {
+  source.getReplicationEndpoint().beforePersistingReplicationOffset();
+}
+stagedWalSize = 0;
+lastStagedFlushTs = EnvironmentEdgeManager.currentTime();
+
+// Clean up hfile references
+for (Entry entry : entriesForCleanUpHFileRefs) {
+  try {
+cleanUpHFileRefs(entry.getEdit());
+  } catch (IOException e) {
+LOG.warn("{} threw unknown exception:",
+  source.getReplicationEndpoint().getClass().getName(), e);
+  }

Review Comment:
   nit: will this be a behavior change because previously when 
`cleanUpHFileRefs` failed, it's throwing thru the function but here we're 
logging it only.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-12 Thread via GitHub


Copilot commented on code in PR #7617:
URL: https://github.com/apache/hbase/pull/7617#discussion_r2684440781


##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +239,41 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+if (stagedWalSize == 0 || lastShippedBatch == null) {
+  return false;
+}
+return (stagedWalSize >= 
source.getReplicationEndpoint().getMaxBufferSize())

Review Comment:
   The condition checks if stagedWalSize is greater than or equal to 
getMaxBufferSize(), but getMaxBufferSize() can return -1 for non-buffering 
endpoints (as documented in ReplicationEndpoint). This means the comparison 
'stagedWalSize >= -1' would always be true when stagedWalSize > 0, causing 
immediate flushes for non-buffering endpoints. While this preserves existing 
behavior, the intent is unclear and could be confusing. Consider explicitly 
checking for -1 to make the logic more explicit.
   ```suggestion
   long maxBufferSize = source.getReplicationEndpoint().getMaxBufferSize();
   // For non-buffering endpoints, getMaxBufferSize() returns a negative 
value (e.g., -1).
   // In that case, we always trigger a flush based on size as soon as 
there is staged data.
   boolean sizeBasedFlush =
 (maxBufferSize < 0) || (stagedWalSize >= maxBufferSize);
   return sizeBasedFlush
   ```



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceShipper.java:
##
@@ -229,6 +239,41 @@ private void shipEdits(WALEntryBatch entryBatch) {
 }
   }
 
+  private boolean shouldPersistLogPosition() {
+if (stagedWalSize == 0 || lastShippedBatch == null) {
+  return false;
+}
+return (stagedWalSize >= 
source.getReplicationEndpoint().getMaxBufferSize())
+  || (EnvironmentEdgeManager.currentTime() - lastStagedFlushTs
+  >= source.getReplicationEndpoint().maxFlushInterval());
+  }
+
+  private void persistLogPosition() {
+if (lastShippedBatch == null) {
+  return;
+}
+if (stagedWalSize > 0) {
+  source.getReplicationEndpoint().beforePersistingReplicationOffset();
+}
+stagedWalSize = 0;
+lastStagedFlushTs = EnvironmentEdgeManager.currentTime();
+
+// Clean up hfile references
+for (Entry entry : entriesForCleanUpHFileRefs) {
+  try {
+cleanUpHFileRefs(entry.getEdit());
+  } catch (IOException e) {

Review Comment:
   The error handling for IOException has been changed to catch and log the 
exception instead of propagating it. This silently suppresses IOException 
failures during cleanup, which could hide serious issues like file system 
problems. If cleanup failures should be non-fatal, this should be explicitly 
documented, or consider at least incrementing a failure metric to track these 
errors.
   ```suggestion
 } catch (IOException e) {
   // Cleanup failures are intentionally treated as non-fatal: 
replication has already
   // succeeded for these entries, so we log the failure and continue.
   ```



##
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationEndpoint.java:
##
@@ -283,4 +283,29 @@ public int getTimeout() {
* @throws IllegalStateException if this service's state isn't FAILED.
*/
   Throwable failureCause();
+
+  // WAL entries are buffered in ContinuousBackupReplicationEndpoint before 
flushing to WAL backup
+  // file. So we return config value CONF_BACKUP_MAX_WAL_SIZE for
+  // ContinuousBackupReplicationEndpoint
+  // and -1 for other ReplicationEndpoint since they don't buffer.
+  // For other ReplicationEndpoint, everytime a WALEntryBatch is shipped, we 
update replication
+  // offset. Please check ReplicationSourceShipper#shouldFlushStagedWal()

Review Comment:
   The comment references 'shouldFlushStagedWal()' but the actual method name 
in ReplicationSourceShipper is 'shouldPersistLogPosition()'. This inconsistency 
will confuse developers trying to understand the interaction between these 
components.



##
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceShipperBufferedFlush.java:
##
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANT

Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-12 Thread via GitHub


Apache-HBase commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3740810146

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 14s |  |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --author-ignore-list 
--blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck  |
    _ Prechecks _ |
    _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 19s |  |  master passed  |
   | +1 :green_heart: |  compile  |   0m 59s |  |  master passed  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  master passed  |
   | +1 :green_heart: |  shadedjars  |   6m 20s |  |  branch has no errors when 
building our shaded downstream artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 55s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  0s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m  0s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   6m 20s |  |  patch has no errors when 
building our shaded downstream artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 251m 26s | 
[/patch-unit-hbase-server.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/2/artifact/yetus-jdk17-hadoop3-check/output/patch-unit-hbase-server.txt)
 |  hbase-server in the patch failed.  |
   |  |   | 277m 59s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.52 ServerAPI=1.52 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/2/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/7617 |
   | Optional Tests | javac javadoc unit compile shadedjars |
   | uname | Linux 82b2e4b8e507 6.14.0-1018-aws #18~24.04.1-Ubuntu SMP Mon Nov 
24 19:46:27 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 95000e024dbdf7543366f7e883db615c10d43d49 |
   | Default Java | Eclipse Adoptium-17.0.11+9 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/2/testReport/
 |
   | Max. process+thread count | 4718 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/2/console 
|
   | versions | git=2.34.1 maven=3.9.8 |
   | Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-12 Thread via GitHub


Apache-HBase commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3739962557

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 28s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  |  Patch does not have any 
anti-patterns.  |
    _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 39s |  |  master passed  |
   | +1 :green_heart: |  compile  |   4m  9s |  |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m 10s |  |  master passed  |
   | +1 :green_heart: |  spotbugs  |   2m  5s |  |  master passed  |
   | +1 :green_heart: |  spotless  |   1m  4s |  |  branch has no errors when 
running spotless:check.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 45s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   4m  8s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   4m  8s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 10s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |   2m 28s |  |  the patch passed  |
   | +1 :green_heart: |  hadoopcheck  |  13m 50s |  |  Patch does not cause any 
errors with Hadoop 3.3.6 3.4.1.  |
   | +1 :green_heart: |  spotless  |   0m 56s |  |  patch has no errors when 
running spotless:check.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 13s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  48m 15s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/2/artifact/yetus-general-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/7617 |
   | Optional Tests | dupname asflicense javac spotbugs checkstyle codespell 
detsecrets compile hadoopcheck hbaseanti spotless |
   | uname | Linux 4f1801e5c24c 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 
23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 95000e024dbdf7543366f7e883db615c10d43d49 |
   | Default Java | Eclipse Adoptium-17.0.11+9 |
   | Max. process+thread count | 83 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/2/console 
|
   | versions | git=2.34.1 maven=3.9.8 spotbugs=4.7.3 |
   | Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-12 Thread via GitHub


ankitsol commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3739717187

   @Apache9 @anmolnar @vinayakphegde Please review this PR as followup to 
preview PR https://github.com/apache/hbase/pull/7591


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-11 Thread via GitHub


Apache-HBase commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3736834357

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 28s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  |  Patch does not have any 
anti-patterns.  |
    _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 30s |  |  master passed  |
   | +1 :green_heart: |  compile  |   3m 28s |  |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  master passed  |
   | +1 :green_heart: |  spotbugs  |   1m 39s |  |  master passed  |
   | +1 :green_heart: |  spotless  |   0m 51s |  |  branch has no errors when 
running spotless:check.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 10s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   3m 33s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   3m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |   1m 43s |  |  the patch passed  |
   | +1 :green_heart: |  hadoopcheck  |  12m  7s |  |  Patch does not cause any 
errors with Hadoop 3.3.6 3.4.1.  |
   | +1 :green_heart: |  spotless  |   0m 46s |  |  patch has no errors when 
running spotless:check.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 11s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  41m  6s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/1/artifact/yetus-general-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/7617 |
   | Optional Tests | dupname asflicense javac spotbugs checkstyle codespell 
detsecrets compile hadoopcheck hbaseanti spotless |
   | uname | Linux cb70c3150cf1 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 
23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / ea8ed6a845cd1ba2f4660974cb3368338cfd7576 |
   | Default Java | Eclipse Adoptium-17.0.11+9 |
   | Max. process+thread count | 86 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/1/console 
|
   | versions | git=2.34.1 maven=3.9.8 spotbugs=4.7.3 |
   | Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] HBASE-29823 Control WAL flush and offset persistence from ReplicationSourceShipper [hbase]

2026-01-11 Thread via GitHub


Apache-HBase commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3736828446

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   2m 48s |  |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  2s |  |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --author-ignore-list 
--blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck  |
    _ Prechecks _ |
    _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 19s |  |  master passed  |
   | +1 :green_heart: |  compile  |   0m 58s |  |  master passed  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  master passed  |
   | +1 :green_heart: |  shadedjars  |   6m  3s |  |  branch has no errors when 
building our shaded downstream artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m  4s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 58s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 58s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   6m  0s |  |  patch has no errors when 
building our shaded downstream artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   9m 16s | 
[/patch-unit-hbase-server.txt](https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/1/artifact/yetus-jdk17-hadoop3-check/output/patch-unit-hbase-server.txt)
 |  hbase-server in the patch failed.  |
   |  |   |  35m 38s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/7617 |
   | Optional Tests | javac javadoc unit compile shadedjars |
   | uname | Linux 1274cf557e0e 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 
23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / ea8ed6a845cd1ba2f4660974cb3368338cfd7576 |
   | Default Java | Eclipse Adoptium-17.0.11+9 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/1/testReport/
 |
   | Max. process+thread count | 1120 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7617/1/console 
|
   | versions | git=2.34.1 maven=3.9.8 |
   | Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]