ifesdjeen commented on code in PR #3656:
URL: https://github.com/apache/cassandra/pull/3656#discussion_r1831214651


##########
src/java/org/apache/cassandra/tcm/PaxosBackedProcessor.java:
##########
@@ -176,7 +176,23 @@ public LogState getLocalState(Epoch start, Epoch end, 
boolean includeSnapshot)
     @Override
     public LogState getLogState(Epoch start, Epoch end, boolean 
includeSnapshot, Retry.Deadline retryPolicy)
     {
-        return DistributedMetadataLogKeyspace.getLogState(start, end, 
includeSnapshot);
+        while (!retryPolicy.reachedMax())
+        {
+            if (Thread.currentThread().isInterrupted())
+            {
+                Thread.currentThread().interrupt();

Review Comment:
   I think I have asked about this one in the other thread, but wanted to learn 
why it is important to re-interrupt already interrupted thread.



##########
src/java/org/apache/cassandra/tcm/RemoteProcessor.java:
##########
@@ -177,7 +177,7 @@ public LogState getLogState(Epoch lowEpoch, Epoch 
highEpoch, boolean includeSnap
         }
         catch (ExecutionException | TimeoutException e)
         {
-            throw new RuntimeException("Could not reconstruct", e);
+            throw new RuntimeException(String.format("Could not reconstruct 
range %d, %d", lowEpoch.getEpoch(), highEpoch.getEpoch()), e);

Review Comment:
   👍 
   



##########
test/distributed/org/apache/cassandra/distributed/impl/Instance.java:
##########
@@ -651,7 +666,28 @@ public void startup(ICluster cluster)
                     throw (RuntimeException) t;
                 throw new RuntimeException(t);
             }
-        }).run();
+        }).call();
+        DurationSpec timeout = startupTimeout();
+        if (timeout == null)
+        {
+            waitOn(result);
+        }
+        else
+        {
+            try
+            {
+                result.get(timeout.quantity(), timeout.unit());
+            }
+            catch (InterruptedException e)
+            {
+                Thread.currentThread().interrupt();

Review Comment:
   Ah nice, good to know, thank you for elaborating!



##########
test/harry/main/org/apache/cassandra/harry/sut/TokenPlacementModel.java:
##########
@@ -237,6 +237,25 @@ public static List<Node> peerStateToNodes(Object[][] 
resultset)
         return nodes;
     }
 
+    private static <T> T get(Object[] row, int idx, String name)
+    {
+        T t = (T) row[idx];
+        if (t == null || ((t instanceof Collection<?>) && ((Collection<?>) 
t).isEmpty()))
+            throw new IncompletePeersStateException(name);
+        return t;
+    }
+
+    /**
+     * When the node sees the new epoch, the update of the peers table is 
async, so checking the table may yield partial results, so need some way to 
detect this to enable retries.

Review Comment:
   Right; we would need to tell Harry about the changes in the ring. We already 
do that in simulator. 



##########
src/java/org/apache/cassandra/utils/concurrent/Ref.java:
##########
@@ -601,6 +601,8 @@ void traverse(final RefCounted.Tidy rootObject)
             InProgressVisit inProgress = null;
             while (inProgress != null || !path.isEmpty())
             {
+                if (Thread.currentThread().isInterrupted())

Review Comment:
   Should we re-interrupt here, too?



##########
src/java/org/apache/cassandra/service/accord/AccordService.java:
##########
@@ -1222,15 +1264,27 @@ public void tryMarkRemoved(Topology topology, Id target)
         if (node.commandStores().count() == 0) return; // when starting up 
stores can be empty, so ignore
         Ranges ranges = topology.rangesForNode(target);
         if (ranges.isEmpty()) return;
-        tryMarkRemoved(ranges, 0).begin(node().agent());
+        long startNanos = Clock.Global.nanoTime();
+        exclusiveSyncPointWithRetries(ranges, 0)

Review Comment:
   Sure, was just wondering really. Thank you for going into detail.



##########
src/java/org/apache/cassandra/service/accord/AccordService.java:
##########
@@ -568,6 +567,18 @@ public static List<ClusterMetadata> tcmLoadRange(long min, 
long max)
         return afterLoad;
     }
 
+    private static List<ClusterMetadata> reconstruct(long min, long max)
+    {
+        Epoch start = Epoch.create(min);
+        Epoch end = Epoch.create(max);
+        Retry.Deadline deadline = Retry.Deadline.wrap(new 
Retry.ExponentialBackoff(42,

Review Comment:
   Is there any reason we're not using TCM configurable here? I'll also be OK 
if we let it be configured it some other way.



##########
test/distributed/org/apache/cassandra/fuzz/sai/SingleNodeSAITestBase.java:
##########
@@ -116,6 +116,7 @@ public void basicSaiTest()
                                                          
ColumnSpec.regularColumn("v3", ColumnSpec.int64Type)),
                                            
List.of(ColumnSpec.staticColumn("s1", ColumnSpec.asciiType(40, 100))),
                                            withAccord ? 
Optional.of(TransactionalMode.full) : Optional.empty())
+                            .withWriteTimeFromAccord(false) // use the harry 
timestamp

Review Comment:
   I wish it was simpler to turn off timestamps in harry; glad this is going to 
be the case going forward.



##########
src/java/org/apache/cassandra/tcm/PaxosBackedProcessor.java:
##########
@@ -168,15 +168,31 @@ public ClusterMetadata fetchLogAndWait(Epoch waitFor, 
Retry.Deadline retryPolicy
     }
 
     @Override
-    public LogState getLocalState(Epoch start, Epoch end, boolean 
includeSnapshot, Retry.Deadline retryPolicy)
+    public LogState getLocalState(Epoch start, Epoch end, boolean 
includeSnapshot)
     {
         return log.storage().getLogState(start, end, includeSnapshot);
     }
 
     @Override
     public LogState getLogState(Epoch start, Epoch end, boolean 
includeSnapshot, Retry.Deadline retryPolicy)
     {
-        return DistributedMetadataLogKeyspace.getLogState(start, end, 
includeSnapshot);
+        while (!retryPolicy.reachedMax())
+        {
+            if (Thread.currentThread().isInterrupted())
+            {
+                Thread.currentThread().interrupt();
+                throw new RuntimeException("Can not reconstruct during 
shutdown", new InterruptedException());
+            }
+            try
+            {
+                return DistributedMetadataLogKeyspace.getLogState(start, end, 
includeSnapshot);
+            }
+            catch (RuntimeException e) // honestly best to only retry 
timeouts, but everything gets wrapped in a RuntimeException...

Review Comment:
   Maybe even Throwable?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to