ifesdjeen commented on code in PR #3656:
URL: https://github.com/apache/cassandra/pull/3656#discussion_r1831214651
##########
src/java/org/apache/cassandra/tcm/PaxosBackedProcessor.java:
##########
@@ -176,7 +176,23 @@ public LogState getLocalState(Epoch start, Epoch end,
boolean includeSnapshot)
@Override
public LogState getLogState(Epoch start, Epoch end, boolean
includeSnapshot, Retry.Deadline retryPolicy)
{
- return DistributedMetadataLogKeyspace.getLogState(start, end,
includeSnapshot);
+ while (!retryPolicy.reachedMax())
+ {
+ if (Thread.currentThread().isInterrupted())
+ {
+ Thread.currentThread().interrupt();
Review Comment:
I think I have asked about this one in the other thread, but wanted to learn
why it is important to re-interrupt already interrupted thread.
##########
src/java/org/apache/cassandra/tcm/RemoteProcessor.java:
##########
@@ -177,7 +177,7 @@ public LogState getLogState(Epoch lowEpoch, Epoch
highEpoch, boolean includeSnap
}
catch (ExecutionException | TimeoutException e)
{
- throw new RuntimeException("Could not reconstruct", e);
+ throw new RuntimeException(String.format("Could not reconstruct
range %d, %d", lowEpoch.getEpoch(), highEpoch.getEpoch()), e);
Review Comment:
👍
##########
test/distributed/org/apache/cassandra/distributed/impl/Instance.java:
##########
@@ -651,7 +666,28 @@ public void startup(ICluster cluster)
throw (RuntimeException) t;
throw new RuntimeException(t);
}
- }).run();
+ }).call();
+ DurationSpec timeout = startupTimeout();
+ if (timeout == null)
+ {
+ waitOn(result);
+ }
+ else
+ {
+ try
+ {
+ result.get(timeout.quantity(), timeout.unit());
+ }
+ catch (InterruptedException e)
+ {
+ Thread.currentThread().interrupt();
Review Comment:
Ah nice, good to know, thank you for elaborating!
##########
test/harry/main/org/apache/cassandra/harry/sut/TokenPlacementModel.java:
##########
@@ -237,6 +237,25 @@ public static List<Node> peerStateToNodes(Object[][]
resultset)
return nodes;
}
+ private static <T> T get(Object[] row, int idx, String name)
+ {
+ T t = (T) row[idx];
+ if (t == null || ((t instanceof Collection<?>) && ((Collection<?>)
t).isEmpty()))
+ throw new IncompletePeersStateException(name);
+ return t;
+ }
+
+ /**
+ * When the node sees the new epoch, the update of the peers table is
async, so checking the table may yield partial results, so need some way to
detect this to enable retries.
Review Comment:
Right; we would need to tell Harry about the changes in the ring. We already
do that in simulator.
##########
src/java/org/apache/cassandra/utils/concurrent/Ref.java:
##########
@@ -601,6 +601,8 @@ void traverse(final RefCounted.Tidy rootObject)
InProgressVisit inProgress = null;
while (inProgress != null || !path.isEmpty())
{
+ if (Thread.currentThread().isInterrupted())
Review Comment:
Should we re-interrupt here, too?
##########
src/java/org/apache/cassandra/service/accord/AccordService.java:
##########
@@ -1222,15 +1264,27 @@ public void tryMarkRemoved(Topology topology, Id target)
if (node.commandStores().count() == 0) return; // when starting up
stores can be empty, so ignore
Ranges ranges = topology.rangesForNode(target);
if (ranges.isEmpty()) return;
- tryMarkRemoved(ranges, 0).begin(node().agent());
+ long startNanos = Clock.Global.nanoTime();
+ exclusiveSyncPointWithRetries(ranges, 0)
Review Comment:
Sure, was just wondering really. Thank you for going into detail.
##########
src/java/org/apache/cassandra/service/accord/AccordService.java:
##########
@@ -568,6 +567,18 @@ public static List<ClusterMetadata> tcmLoadRange(long min,
long max)
return afterLoad;
}
+ private static List<ClusterMetadata> reconstruct(long min, long max)
+ {
+ Epoch start = Epoch.create(min);
+ Epoch end = Epoch.create(max);
+ Retry.Deadline deadline = Retry.Deadline.wrap(new
Retry.ExponentialBackoff(42,
Review Comment:
Is there any reason we're not using TCM configurable here? I'll also be OK
if we let it be configured it some other way.
##########
test/distributed/org/apache/cassandra/fuzz/sai/SingleNodeSAITestBase.java:
##########
@@ -116,6 +116,7 @@ public void basicSaiTest()
ColumnSpec.regularColumn("v3", ColumnSpec.int64Type)),
List.of(ColumnSpec.staticColumn("s1", ColumnSpec.asciiType(40, 100))),
withAccord ?
Optional.of(TransactionalMode.full) : Optional.empty())
+ .withWriteTimeFromAccord(false) // use the harry
timestamp
Review Comment:
I wish it was simpler to turn off timestamps in harry; glad this is going to
be the case going forward.
##########
src/java/org/apache/cassandra/tcm/PaxosBackedProcessor.java:
##########
@@ -168,15 +168,31 @@ public ClusterMetadata fetchLogAndWait(Epoch waitFor,
Retry.Deadline retryPolicy
}
@Override
- public LogState getLocalState(Epoch start, Epoch end, boolean
includeSnapshot, Retry.Deadline retryPolicy)
+ public LogState getLocalState(Epoch start, Epoch end, boolean
includeSnapshot)
{
return log.storage().getLogState(start, end, includeSnapshot);
}
@Override
public LogState getLogState(Epoch start, Epoch end, boolean
includeSnapshot, Retry.Deadline retryPolicy)
{
- return DistributedMetadataLogKeyspace.getLogState(start, end,
includeSnapshot);
+ while (!retryPolicy.reachedMax())
+ {
+ if (Thread.currentThread().isInterrupted())
+ {
+ Thread.currentThread().interrupt();
+ throw new RuntimeException("Can not reconstruct during
shutdown", new InterruptedException());
+ }
+ try
+ {
+ return DistributedMetadataLogKeyspace.getLogState(start, end,
includeSnapshot);
+ }
+ catch (RuntimeException e) // honestly best to only retry
timeouts, but everything gets wrapped in a RuntimeException...
Review Comment:
Maybe even Throwable?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]