ifesdjeen commented on code in PR #3196:
URL: https://github.com/apache/cassandra/pull/3196#discussion_r1559437385


##########
test/distributed/org/apache/cassandra/distributed/test/TransientRangeMovement2Test.java:
##########
@@ -51,9 +59,30 @@ public void testMoveBackward() throws IOException, 
ExecutionException, Interrupt
                                            .start()))
         {
             populate(cluster);
-            cluster.get(3).nodetoolResult("move", "25").asserts().success();
+            // At the start, node1 is a TRANSIENT replica for (20,30] and FULL 
for (30, 40], (40,] & (,10]. When moving
+            // node3 to token 25, node1 becomes a FULL replica for (25, 40], 
effectively going from TRANSIENT to FULL
+            // for (25,30]. A T->F transition will always cause data for that 
range to be streamed to the transitioning
+            // node, which happens after StartMove and before MidMove. Running 
cleanup before node1 considers itself a
+            // FULL replica would remove any of the newly streamed data which 
is marked repaired. To avoid this, we
+            // ensure that any T->F transition is applied for writes as part 
of the StartMove. Have the CMS node (node1)
+            // pause before the MidMove step is committed, at which point we 
know that streaming has completed.
+            Callable<Epoch> pending = pauseBeforeCommit(cluster.get(1), (e) -> 
e instanceof PrepareMove.MidMove);
+            new Thread(() -> cluster.get(3).nodetoolResult("move", 
"25").asserts().success()).start();

Review Comment:
   Would you mind to add a `join` after we have finished with this thread below?



##########
src/java/org/apache/cassandra/tcm/sequences/RemoveNodeStreams.java:
##########
@@ -131,33 +131,12 @@ private static MovementMap movementMap(InetAddressAndPort 
leaving, ClusterMetada
                                        if (!replica.endpoint().equals(leaving) 
&& !replica.endpoint().equals(newReplica.endpoint()))
                                            candidateBuilder.add(replica, 
ReplicaCollection.Builder.Conflict.NONE);
                                    });
-                                   movements.putAll(newReplica, 
candidateBuilder.build(), ReplicaCollection.Builder.Conflict.NONE);
+                                   EndpointsForRange sources = 
candidateBuilder.build();
+                                   // log if newReplica is an existing 
transient replica moving to a full replica

Review Comment:
   We can make a similar change in `UnbootstrapStreams` calculation of the 
movement map, change the mid check to 
   
   ``` 
   startWriteAdditions.get(removal.endpoint()).contains(removal.range(), true);
   ```



##########
src/java/org/apache/cassandra/tcm/sequences/RemoveNodeStreams.java:
##########
@@ -131,33 +131,12 @@ private static MovementMap movementMap(InetAddressAndPort 
leaving, ClusterMetada
                                        if (!replica.endpoint().equals(leaving) 
&& !replica.endpoint().equals(newReplica.endpoint()))
                                            candidateBuilder.add(replica, 
ReplicaCollection.Builder.Conflict.NONE);
                                    });
-                                   movements.putAll(newReplica, 
candidateBuilder.build(), ReplicaCollection.Builder.Conflict.NONE);
+                                   EndpointsForRange sources = 
candidateBuilder.build();
+                                   // log if newReplica is an existing 
transient replica moving to a full replica
+                                   if 
(startWriteRemovals.get(newReplica.endpoint()).contains(newReplica.range(), 
false))
+                                       logger.debug("Streaming transient -> 
full conversion to {} from {}", newReplica.endpoint(), sources);
+                                   movements.putAll(newReplica, sources, 
ReplicaCollection.Builder.Conflict.NONE);
                                });
-            // and check if any replicas went from transient -> full:

Review Comment:
   it also happens that we do not need mid and finish delta for this method 
anymore. Should we remove them?



##########
src/java/org/apache/cassandra/tcm/transformations/PrepareMove.java:
##########
@@ -103,7 +104,9 @@ public Result execute(ClusterMetadata prev)
         StartMove startMove = new StartMove(nodeId, 
transitionPlan.addToWrites(), lockKey);
         MidMove midMove = new MidMove(nodeId, transitionPlan.moveReads(), 
lockKey);
         FinishMove finishMove = new FinishMove(nodeId, tokens, 
transitionPlan.removeFromWrites(), lockKey);
-
+        Result res = 
PlacementTransitionPlan.assertPreExistingWriteReplica(prev.placements, 
transitionPlan);

Review Comment:
   Do you folks think maybe having an Option here could be useful? Or some form 
of polymorphic result that would visually explain the significance of `null` 
equivalent?



##########
test/distributed/org/apache/cassandra/distributed/test/log/PlacementSimulator.java:
##########
@@ -985,20 +977,84 @@ public Diff<T> invert()
         }
     }
 
-    public static String diffsToString(Map<Range, Diff<Node>> placements)
+    public static Diff<Replica> additionsAndTransientToFull(Diff<Replica> 
unfiltered)
+    {
+        if (unfiltered.additions.isEmpty())
+            return null;
+        List<Replica> additions = new ArrayList<>(unfiltered.additions.size());
+        List<Replica> removals = new ArrayList<>(unfiltered.additions.size());
+        for (Replica added : unfiltered.additions)
+        {
+            // Include any new FULL replicas here. If there exists the removal 
of a corresponding Transient
+            // replica (i.e. a switch from T -> F), add that too. We want T -> 
F transitions to happen early
+            // in a multi-step operation, at the same time as new write 
replicas are added.
+            if (added.isFull())
+            {
+                additions.add(added);
+                for (Replica removed : unfiltered.removals)
+                    if (removed.node().equals(added.node()) && 
removed.isTransient())
+                        removals.add(removed);
+            }
+            else
+            {
+                // Conversely, when a replica transitions from F -> T, it's 
enacted late in a multi-step operation.
+                // So only include TRANSIENT additions if there is no removal 
of a corresponding FULL replica.
+                boolean include = unfiltered.removals.stream()
+                                                     .noneMatch(removed -> 
removed.node().equals(added.node())
+                                                                           && 
removed.isFull());
+                if (include)
+                    additions.add(added);
+            }
+        }
+        return new Diff<>(additions, removals);
+    }
+
+    public static Diff<Replica> removalsAndFullToTransient(Diff<Replica> 
unfiltered)
+    {
+        if (unfiltered.removals.isEmpty())
+            return null;
+        List<Replica> additions = new ArrayList<>(unfiltered.removals.size());
+        List<Replica> removals = new ArrayList<>(unfiltered.removals.size());
+        for (Replica removed : unfiltered.removals)
+        {
+            // Include any new FULL replicas here. If there exists the removal 
of a corresponding Transient
+            // replica (i.e. a switch from T -> F), add that too. We want T -> 
F transitions to happen early
+            // in a multi-step operation, at the same time as new write 
replicas are added.
+            if (removed.isFull())
+            {
+                removals.add(removed);
+                for (Replica added : unfiltered.additions)

Review Comment:
   Nit: I think same comment applies then to the above 
`additionsAndTransientToFull` code.



##########
test/harry/main/org/apache/cassandra/harry/sut/TokenPlacementModel.java:
##########
@@ -62,18 +75,34 @@ public ReplicatedRanges replicate(List<Node> nodes)
         public abstract ReplicatedRanges replicate(Range[] ranges, List<Node> 
nodes);
     }
 
+    public static class DCReplicas
+    {
+        public final int totalCount;

Review Comment:
   Nit: should we try keeping API here consistent with RF?
   
   ```
       public final int allReplicas;
       public final int fullReplicas;
   ```



##########
test/distributed/org/apache/cassandra/distributed/test/log/PlacementSimulator.java:
##########
@@ -985,20 +977,84 @@ public Diff<T> invert()
         }
     }
 
-    public static String diffsToString(Map<Range, Diff<Node>> placements)
+    public static Diff<Replica> additionsAndTransientToFull(Diff<Replica> 
unfiltered)
+    {
+        if (unfiltered.additions.isEmpty())
+            return null;
+        List<Replica> additions = new ArrayList<>(unfiltered.additions.size());
+        List<Replica> removals = new ArrayList<>(unfiltered.additions.size());
+        for (Replica added : unfiltered.additions)
+        {
+            // Include any new FULL replicas here. If there exists the removal 
of a corresponding Transient
+            // replica (i.e. a switch from T -> F), add that too. We want T -> 
F transitions to happen early
+            // in a multi-step operation, at the same time as new write 
replicas are added.
+            if (added.isFull())
+            {
+                additions.add(added);
+                for (Replica removed : unfiltered.removals)

Review Comment:
   Nit: should use same style for filtering / checking removals in both 
branches? 
   
   I _think_ we can also break out from the loop here early if we did add a new 
entry to `removals`.



##########
test/distributed/org/apache/cassandra/distributed/test/TransientRangeMovement2Test.java:
##########
@@ -51,9 +59,30 @@ public void testMoveBackward() throws IOException, 
ExecutionException, Interrupt
                                            .start()))
         {
             populate(cluster);
-            cluster.get(3).nodetoolResult("move", "25").asserts().success();
+            // At the start, node1 is a TRANSIENT replica for (20,30] and FULL 
for (30, 40], (40,] & (,10]. When moving
+            // node3 to token 25, node1 becomes a FULL replica for (25, 40], 
effectively going from TRANSIENT to FULL
+            // for (25,30]. A T->F transition will always cause data for that 
range to be streamed to the transitioning
+            // node, which happens after StartMove and before MidMove. Running 
cleanup before node1 considers itself a
+            // FULL replica would remove any of the newly streamed data which 
is marked repaired. To avoid this, we
+            // ensure that any T->F transition is applied for writes as part 
of the StartMove. Have the CMS node (node1)
+            // pause before the MidMove step is committed, at which point we 
know that streaming has completed.
+            Callable<Epoch> pending = pauseBeforeCommit(cluster.get(1), (e) -> 
e instanceof PrepareMove.MidMove);
+            new Thread(() -> cluster.get(3).nodetoolResult("move", 
"25").asserts().success()).start();
+
+            // Before allowing node1 to proceed with committing the MidMove, 
instruct it to pause node1 before it

Review Comment:
   nit: This line reads a bit tautological, would maybe something like "To 
gate/prevent node1 from proceeding with committing MidMove, instruct it to 
pause before enacting it" ?



##########
src/java/org/apache/cassandra/tcm/transformations/PrepareJoin.java:
##########
@@ -156,6 +153,12 @@ public Result execute(ClusterMetadata prev)
                                                              
transitionPlan.toSplit,
                                                              startJoin, 
midJoin, finishJoin,
                                                              joinTokenRing, 
streamData);
+        if (!(this instanceof UnsafeJoin) && !prev.tokenMap.isEmpty())

Review Comment:
   Would you be OK to move this logic to an instance method, and have 
implemenation in PrepareJoin, and override it in UnsafeJoin? It would be great 
if we could keep it polymorphic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to