Github user belliottsmith commented on a diff in the pull request:

    https://github.com/apache/cassandra/pull/267#discussion_r219133467
  
    --- Diff: src/java/org/apache/cassandra/locator/ReplicaPlans.java ---
    @@ -61,26 +73,62 @@
             return forSingleReplicaWrite(keyspace, token, replica);
         }
     
    +    public static ReplicaPlan.ForTokenWrite forLocalBatchlogWrite()
    +    {
    +        Token token = 
DatabaseDescriptor.getPartitioner().getMinimumToken();
    +        Keyspace systemKeypsace = 
Keyspace.open(SchemaConstants.SYSTEM_KEYSPACE_NAME);
    +        Replica localSystemReplica = 
SystemReplicas.getSystemReplica(FBUtilities.getBroadcastAddressAndPort());
    +
    +        ReplicaLayout.ForTokenWrite liveAndDown = 
ReplicaLayout.forTokenWrite(
    +        EndpointsForToken.of(token, localSystemReplica),
    +        EndpointsForToken.empty(token)
    +        );
    +
    +        return forWrite(systemKeypsace, ConsistencyLevel.ONE, liveAndDown, 
liveAndDown, writeAll);
    +    }
    +
         /**
          * Requires that the provided endpoints are alive.  Converts them to 
their relevant system replicas.
          * Note that the liveAndDown collection and live are equal to the 
provided endpoints.
    -     *
    -     * The semantics are a bit weird, in that CL=ONE iff we have one node 
provided, and otherwise is equal to TWO.
    -     * How these CL were chosen, and why we drop the CL if only one live 
node is available, are both unclear.
          */
    -    public static ReplicaPlan.ForTokenWrite forBatchlogWrite(Keyspace 
keyspace, Collection<InetAddressAndPort> endpoints) throws UnavailableException
    +    public static ReplicaPlan.ForTokenWrite forBatchlogWrite(String 
localDataCenter, ConsistencyLevel consistencyLevel) throws UnavailableException
         {
             // A single case we write not for range or token, but multiple 
mutations to many tokens
             Token token = 
DatabaseDescriptor.getPartitioner().getMinimumToken();
     
    +        TokenMetadata.Topology topology = 
StorageService.instance.getTokenMetadata().cachedOnlyTokenMap().getTopology();
    +        Multimap<String, InetAddressAndPort> localEndpoints = 
HashMultimap.create(topology.getDatacenterRacks().get(localDataCenter));
    +        String localRack = 
DatabaseDescriptor.getEndpointSnitch().getRack(FBUtilities.getBroadcastAddressAndPort());
    +
    +        // Replicas are picked manually:
    +        //  - replicas should be alive according to the failure detector
    +        //  - replicas should be in the local datacenter
    +        //  - choose min(2, number of qualifying candiates above)
    +        //  - allow the local node to be the only replica only if it's a 
single-node DC
    +        Collection<InetAddressAndPort> chosenEndpoints = new 
BatchlogManager.EndpointFilter(localRack, localEndpoints).filter();
    +
    +        if (chosenEndpoints.isEmpty())
    +        {
    +            if (consistencyLevel == ConsistencyLevel.ANY)
    +                chosenEndpoints = 
Collections.singleton(FBUtilities.getBroadcastAddressAndPort());
    +            else
    +                throw UnavailableException.create(ConsistencyLevel.ONE, 1, 
0);
    +        }
    +
             ReplicaLayout.ForTokenWrite liveAndDown = 
ReplicaLayout.forTokenWrite(
    -                
SystemReplicas.getSystemReplicas(endpoints).forToken(token),
    +                
SystemReplicas.getSystemReplicas(chosenEndpoints).forToken(token),
                     EndpointsForToken.empty(token)
             );
    -        ConsistencyLevel consistencyLevel = liveAndDown.all().size() == 1 
? ConsistencyLevel.ONE : ConsistencyLevel.TWO;
    +
    +        // Batchlog is hosted by either one node or two nodes from 
different racks.
    +        consistencyLevel = liveAndDown.all().size() == 1 ? 
ConsistencyLevel.ONE : ConsistencyLevel.TWO;
    --- End diff --
    
    Are we sure it's OK (although consistent with prior behaviour), to only 
require CL.ONE if all nodes besides us are down?  The logic above suggests we 
only require CL.ONE if we're a single-node DC, but we could have multiple 
failing nodes in our DC, and be the only one left.  It seems like we should 
probably claim insufficient consistency in this case, though this question 
probably deserves a separate ticket for discussion.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to