Ivan created IGNITE-12950:
-----------------------------

             Summary: Partitions validator must check sizes even if update 
counters are different
                 Key: IGNITE-12950
                 URL: https://issues.apache.org/jira/browse/IGNITE-12950
             Project: Ignite
          Issue Type: Improvement
          Components: cache
            Reporter: Ivan
             Fix For: 2.9


We have method in GridDhtPartitionsStateValidator:
{code:java}
// public void validatePartitionCountersAndSizes(
        GridDhtPartitionsExchangeFuture fut,
        GridDhtPartitionTopology top,
        Map<UUID, GridDhtPartitionsSingleMessage> messages
    ) throws IgniteCheckedException {
        final Set<UUID> ignoringNodes = new HashSet<>();

        // Ignore just joined nodes.
        for (DiscoveryEvent evt : fut.events().events()) {
            if (evt.type() == EVT_NODE_JOINED)
                ignoringNodes.add(evt.eventNode().id());
        }

        AffinityTopologyVersion topVer = 
fut.context().events().topologyVersion();

        // Validate update counters.
        Map<Integer, Map<UUID, Long>> result = 
validatePartitionsUpdateCounters(top, messages, ignoringNodes);

        if (!result.isEmpty())
            throw new IgniteCheckedException("Partitions update counters are 
inconsistent for " + fold(topVer, result));

        // For sizes validation ignore also nodes which are not able to send 
cache sizes.
        for (UUID id : messages.keySet()) {
            ClusterNode node = cctx.discovery().node(id);
            if (node != null && 
node.version().compareTo(SIZES_VALIDATION_AVAILABLE_SINCE) < 0)
                ignoringNodes.add(id);
        }

        if (!cctx.cache().cacheGroup(top.groupId()).mvccEnabled()) { // TODO: 
Remove "if" clause in IGNITE-9451.
            // Validate cache sizes.
            result = validatePartitionsSizes(top, messages, ignoringNodes);

            if (!result.isEmpty())
                throw new IgniteCheckedException("Partitions cache sizes are 
inconsistent for " + fold(topVer, result));
        }
    }
{code}
{{}}
We should check paritions sizes even if update counters are different. It could 
be helpful for debug problems on production.
We must print information about all copies, if partition is in inconsistent 
state. Now we could get message on cache group with 3 backups:
{code:java}
// Partition states validation has failed for group: 
CACHEGROUP_PARTICLE_union-module_com.sbt.processing.data.partition.dpl.PartitionKey.
 Partitions update counters are inconsistent for Part 3415: 
[10.104.6.10:47500=2577263 10.104.6.12:47500=2577263 10.104.6.23:47500=2577262 
10.104.6.9:47500=2577263 ] Part 4960: [10.104.6.11:47500=2560994 
10.104.6.23:47500=2560993 ]
{code}
(part 4960 contains information about 2 copies only)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to