1 & 2. Actually, looking at latest Master on the release of 2.7.5 and the
current Master version, it is 'pickMvccCoordinator' function which returns
the coordinator (this is same function that selects node that is not Client
and Ignite version >= 2.7). curCrd is then assigned the return variable of
pickMvccCoordinator, which becomes the active Mvcc coordinator. So looks
like it does become active, but not sure the effect of that yet.

3. Assuming it is then active, looks like there are two entry points into
recoveryBallotBoxes. Through  onDiscovery() and via
processRecoveryFinishedMessage().

Is it possible that onDiscovery() does not populate recoveryBallotBoxes as
there is curCrd0.local() check - so processing will only be done if MVCC
coordinator is local - thus a node that is actually a MVCC coordinator will
clear out the recoveryBallotBoxes (which is the explicit check that you
mentioned).

But the population might be through processRecoveryFinishedMessage() - which
does not do any check for isLocal() and goes straight to processing message
since waitForCoordinatorInit() on a MvccRecoveryFinishedMessage always
returns false?


Ivan Pavlukhin wrote
> 1. MVCC coordinator should not do anything when there is no MVCC
> caches, actually it should not be active in such case. Basically, MVCC
> coordinator is needed to have a consistent order between transactions.
> 2. In 2.7.5 "assigned coordinator" is always selected but it does not
> mean that it is active. MvccProcessorImpl.curCrd variable corresponds
> to active MVCC coordinator.
> 3. If that statement is true, then it should be rather easy to
> reproduce the problem by starting and stopping client nodes
> frequently. recoveryBallotBoxes was not assumed to be populated on
> nodes other than MVCC coordinator. If it happens than we found a bug.
> Actually, the code in master is different and has an explicit check
> that recoveryBallotBoxes are populated only on MVCC coordinator.
> 
> чт, 14 нояб. 2019 г. в 15:42, mvkarp <

> liquid_ninja2k@

> >:
>>
>> Hi, after investigating I have few questions regarding this issue.
>>
>> 1. Having lack of knowledge in what MVCC coordinator is used for, are you
>> able to shed some light on the role and purpose of the MVCC coordinator?
>> What does the MVCC coordinator do, why is one selected? Should an MVCC
>> coordinator be selected regardless of MVCC being disabled? (i.e. is it
>> used
>> for any other base features and is it just the way Ignite is meant to
>> work)
>>
>> 2. Following on from this, after looking at the code of the
>> MvccProcessorImpl.java class in Ignite 2.7.5 Github, it looks like an
>> MVCC
>> coordinator is ALWAYS selected and assigns one of the server nodes as the
>> MVCC coordinator, regardless of having TRANSACTIONAL_SNAPSHOT cache or
>> not
>> (mvccEnabled can be false but a MVCC coordinator is still be selected).
>>
>> https://github.com/apache/ignite/blob/ignite-2.7.5/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/mvcc/MvccProcessorImpl.java
>>
>> On Line 861, in assignMvccCoordinator method, it loops through all nodes
>> in
>> the cluster with only these two conditions.
>>
>> *if (!node.isClient() && supportsMvcc(node))*
>>
>> It only checks if the node is not a client, and that is supportsMvcc
>> (which
>> is true for all versions > 2.7). It does not check mvccEnabled at all.
>>
>>
>> Can you confirm the above is intentional/expected or if there is another
>> piece of code I am missing?
>>
>>
>> 3. As extra information, the node that happens to be selected as MVCC
>> coordinator does not get the leak. But every other client/server gets the
>> leak.
>>
>>
>>
>> Ivan Pavlukhin wrote
>> > Hi,
>> >
>> > I suspect a following here. Some node treats itself as a MVCC
>> > coordinator and creates a new RecoveryBallotBox when each client node
>> > leaves. Some (may be all) other nodes think that MVCC is disabled and
>> > do not send a vote (assumed for aforementioned ballot box) to MVCC
>> > coordinator. Consequently a memory leak.
>> >
>> > A following could be done:
>> > 1. Figure out why some node treats itself MVCC coordinator and others
>> > think that MVCC is disabled.
>> > 2. Try to introduce some defensive matters in Ignite code to protect
>> > from the leak in a long running cluster.
>> >
>> > As a last chance workaround I can suggest writing custom code, which
>> > cleans recoveryBallotBoxes map from time to time (most likely using
>> > reflection).
>> >
>> > пн, 11 нояб. 2019 г. в 08:53, mvkarp <
>>
>> > liquid_ninja2k@
>>
>> > >:
>> >>
>> >> We have frequently stopping and starting clients in short lived client
>> >> JVM
>> >> processes as required for our purposes, this seems to lead to a huge
>> >> bunch
>> >> of PME (but no rebalancing) and topology changes (topVer=300,000+)
>> >>
>> >> Still can not figure out why this map won't clear (there are no
>> >> exceptions
>> >> or err at all in the entire log)
>> >>
>> >>
>> >>
>> >> --
>> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>> >
>> >
>> >
>> > --
>> > Best regards,
>> > Ivan Pavlukhin
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> 
> 
> 
> -- 
> Best regards,
> Ivan Pavlukhin





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to