iamaleksey commented on code in PR #4106:
URL: https://github.com/apache/cassandra/pull/4106#discussion_r2070147166


##########
src/java/org/apache/cassandra/replication/Shard.java:
##########
@@ -94,6 +109,42 @@ void addSummaryForRange(AbstractBounds<PartitionPosition> 
range, boolean include
         });
     }
 
+    List<InetAddressAndPort> remoteReplicas()
+    {
+        List<InetAddressAndPort> replicas = new 
ArrayList<>(participants.size() - 1);
+        for (int i = 0, size = participants.size(); i < size; ++i)
+        {
+            int hostId = participants.get(i);
+            if (hostId != localHostId)
+                replicas.add(ClusterMetadata.current().directory.endpoint(new 
NodeId(hostId)));
+        }
+        return replicas;
+    }
+
+    /**
+     * Collects replicated offsets for the logs owned by this coordinator on 
this shard.
+     */
+    ShardReplicatedOffsets collectReplicatedOffsets()
+    {
+        Long2ObjectHashMap<LogReplicatedOffsets> offsets = new 
Long2ObjectHashMap<>();
+        for (CoordinatorLogPrimary log : primaryLogs())

Review Comment:
   It's not about the broadcast payload size in isolation, which I agree is 
ultimately not a serious issue. There is also work that you need to do with 
that message when it arrives. Multiply that by frequency of broadcasts, and - 
possibly - by RF, and you get the final cost. There is a maximum cost that we 
are willing to pay here, and the main variable - client write frequency being 
mainly outside of our control - is the frequency of broadcasts. If only the 
coordinator does broadcasting of its logs' states, then you can have a higher 
frequency of broadcasts. If every replica does, then you have to scale down the 
maximum broadcast frequency by an order of RF. And we want the broadcasts to be 
*frequent* to make reads as cheap as possible. Every avoidable delay in 
propagation potentially costs us blocking on reconciles that don't really need 
to be done, and/or triggering SRP that could be avoided by a broadcast arriving 
earlier. Additionally, the broadcasts from non-coordinator nodes wi
 ll be always almost entirely redundant subsets of coordinator's broadcasts - 
who will always have the freshest and fullest picture, barring some in-flight 
write responses.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org

Reply via email to