ifesdjeen commented on code in PR #3197:
URL: https://github.com/apache/cassandra/pull/3197#discussion_r1549661290
##########
src/java/org/apache/cassandra/locator/LocalStrategy.java:
##########
@@ -53,4 +55,17 @@ public ReplicationFactor getReplicationFactor()
{
return RF;
}
+
+ /**
+ * For lazy initialisation. In some circumstances, we may want to
instantiate LocalStrategy without initialising
+ * DatabaseDescriptor; FQL replay is one such usage as we initialise the
KeyspaceMetadata objects, which now eagerly
+ * creates the replication strategy.
+ */
+ static class EntireRange
+ {
Review Comment:
Should we have these per partitioner maybe?
##########
src/java/org/apache/cassandra/tcm/ownership/DataPlacement.java:
##########
@@ -25,19 +25,28 @@
import java.util.Objects;
import java.util.Set;
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.dht.IPartitioner;
import org.apache.cassandra.dht.Range;
import org.apache.cassandra.dht.Token;
import org.apache.cassandra.io.util.DataInputPlus;
import org.apache.cassandra.io.util.DataOutputPlus;
import org.apache.cassandra.locator.InetAddressAndPort;
+import org.apache.cassandra.locator.MetaStrategy;
import org.apache.cassandra.locator.Replica;
+import org.apache.cassandra.schema.ReplicationParams;
import org.apache.cassandra.tcm.Epoch;
import org.apache.cassandra.tcm.serialization.MetadataSerializer;
import org.apache.cassandra.tcm.serialization.Version;
public class DataPlacement
{
- public static final Serializer serializer = new Serializer();
+ private static final Serializer serializer = new
Serializer(DatabaseDescriptor.getPartitioner());
Review Comment:
Should we attempt to standartize and use `global` here? Also, should we
rename this private field to something like global serialiser?
##########
src/java/org/apache/cassandra/locator/CMSPlacementStrategy.java:
##########
@@ -115,8 +117,11 @@ public Set<NodeId> reconfigure(Set<NodeId> currentCms,
ClusterMetadata metadata)
}
}
- EndpointsForRange endpoints =
NetworkTopologyStrategy.calculateNaturalReplicas(EntireRange.entireRange.left,
-
EntireRange.entireRange,
+ // Although MetaStrategy has its own entireRange, it uses a custom
partitioner which isn't compatible with
+ // NTS. For that reason, we select replicas here using tokens
provided by the configured partitioner.
Review Comment:
nit: i understand this is technically correct to say NTS here, but maybe it
is worth saying "regular placements" or non-CMS placements?
##########
src/java/org/apache/cassandra/repair/messages/PrepareMessage.java:
##########
@@ -106,10 +106,7 @@ public void serialize(PrepareMessage message,
DataOutputPlus out, int version) t
message.parentRepairSession.serialize(out);
out.writeInt(message.ranges.size());
for (Range<Token> r : message.ranges)
- {
Review Comment:
I am not opposed to this, but could you add a comment for posterity about
why you have decided to remove `validate`?
##########
src/java/org/apache/cassandra/tcm/ownership/MovementMap.java:
##########
@@ -144,7 +146,8 @@ public MovementMap deserialize(DataInputPlus in, int
version) throws IOException
for (int i = 0; i < size; i++)
{
ReplicationParams params =
ReplicationParams.messageSerializer.deserialize(in, version);
- EndpointsByReplica endpointsByReplica =
EndpointsByReplica.serializer.deserialize(in, version);
+ IPartitioner partitioner = params.isMeta() ?
MetaStrategy.partitioner : IPartitioner.global();
Review Comment:
I have only briefly checked but it looks like we are not using movement maps
for meta keyspace, only diffs. Do we need this?
##########
src/java/org/apache/cassandra/locator/ReplicaLayout.java:
##########
@@ -383,17 +383,32 @@ static ReplicaLayout.ForRangeRead
forRangeReadLiveSorted(ClusterMetadata metadat
static EndpointsForRange forNonLocalStategyRangeRead(ClusterMetadata
metadata, KeyspaceMetadata keyspace, AbstractBounds<PartitionPosition> range)
{
- return
metadata.placements.get(keyspace.params.replication).reads.forRange(range.right.getToken()).get();
+ // MetaStrategy distributes the entire keyspace to all replicas. In
addition, its tables (currently only
+ // the dist log table) don't use the globally configured partitioner.
For these reasons we don't lookup the
+ // replicas using the supplied token as this can actually be of the
incorrect type (for example when
+ // performing Paxos repair).
+ Token searchToken = keyspace.params.replication.isMeta() ?
MetaStrategy.entireRange.right : range.right.getToken();
Review Comment:
When looking at things like that I am wondering if we should add a jira for
actually making it possible to run with different partitioners.
With paxos repair I can see the point: there the paxos table has a different
token, but token reads/writes, or for range reads - do we really need to have
this special-casing? Here, we should be using the bounds consistent with the
keyspace. Or at least removing this special casing does not break
`ConsistentBootstrapTest` or `ReconfigureCMSTest`.
##########
src/java/org/apache/cassandra/tcm/ClusterMetadataService.java:
##########
@@ -754,14 +752,9 @@ public MetadataSnapshots snapshotManager()
return snapshots;
}
- public ClusterMetadata sealPeriod()
+ public ClusterMetadata triggerSnapshot()
{
- return ClusterMetadataService.instance.commit(SealPeriod.instance,
- (ClusterMetadata
metadata) -> metadata,
- (code, reason) -> {
- // If the
transformation got rejected, someone else has beat us to seal this period
- return
ClusterMetadata.current();
- });
+ return
ClusterMetadataService.instance.commit(TriggerSnapshot.instance);
Review Comment:
Ah, interesting, looks like there's no way for the snapshot to get rejected
now. I can't see how this can cause anything bad though.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]