Re: [PR] CASSANDRA-19744 Accord migration and interop correctness [cassandra-accord]

via GitHub Mon, 12 Aug 2024 12:42:56 -0700


dcapwell commented on code in PR #101:
URL: https://github.com/apache/cassandra-accord/pull/101#discussion_r1714286298



##########
accord-core/src/main/java/accord/topology/TopologyManager.java:
##########
@@ -371,15 +391,87 @@ private int indexOf(long epoch)
         }
     }
 
+    private static class FutureEpoch
+    {
+        private volatile AsyncResult.Settable<Void> future;
+        private long deadlineMillis;
+
+        public FutureEpoch(long deadlineMillis)
+        {
+            this.future = AsyncResults.settable();
+            this.deadlineMillis = deadlineMillis;
+        }
+
+        /*
+         * Notify any listeners that are waiting for the epoch that is has 
been a long time since
+         * we started waiting for the epoch. We may still eventually get the 
epoch so also create
+         * a new future so subsequent operations may have a chance at seeing 
the epoch if it ever appears.
+         *
+         * Subsequent waiters may get a timeout notification far sooner 
(WATCHDOG_INTERVAL_MILLISS)
+         * instead of EPOCH_INITIAL_TIMEOUT_MILLIS
+         */
+        @GuardedBy("TopologyManager.this")
+        private void timeOutCurrentListeners(long newDeadline, Agent agent)
+        {
+            deadlineMillis = newDeadline;
+            AsyncResult.Settable<Void> oldFuture = future;
+            if (oldFuture.isDone())
+                return;
+            future = AsyncResults.settable();
+            future.addCallback(agent);
+            oldFuture.tryFailure(new Timeout(null, null));
+        }
+    }
+
     private final TopologySorter.Supplier sorter;
+    private final Agent agent;
     private final Id node;
+    private final Scheduler scheduler;
+    private final ToLongFunction<TimeUnit> nowTimeUnit;
     private volatile Epochs epochs;
+    private Scheduler.Scheduled topologyUpdateWatchdog;
+
+    private final LocalConfig localConfig;
 
-    public TopologyManager(TopologySorter.Supplier sorter, Id node)
+    public TopologyManager(TopologySorter.Supplier sorter, Agent agent, Id 
node, Scheduler scheduler, ToLongFunction<TimeUnit> nowTimeUnit, LocalConfig 
localConfig)
     {
         this.sorter = sorter;
+        this.agent = agent;
         this.node = node;
+        this.scheduler = scheduler;
+        this.nowTimeUnit = nowTimeUnit;
         this.epochs = Epochs.EMPTY;
+        this.localConfig = localConfig;
+    }
+
+    public void shutdown()
+    {
+        topologyUpdateWatchdog.cancel();
+    }
+
+    public void scheduleTopologyUpdateWatchdog()

Review Comment:
   spoke with Ariel and this likely isn't a big deal.... when a local node sees 
an epoch from its peer it won't have "received" it until the following happens.
   
   1) TCM notifies the node
   2) the previous epoch was `epochs.receive(topology);`
   3) the previous epoch has constructed the `Bootstrap` class (or 
`EpochReady.DONE`).
   
   Once all 3 are done, then the local node will receive the topology.  This 
most likely causes us to mostly wait on TCM and not the other steps as they are 
"fast" (assuming nothing hangs forever)
   
   
   given this, a batch timeout should be fine... we can always change later on 
if its an issue



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] CASSANDRA-19744 Accord migration and interop correctness [cassandra-accord]

Reply via email to