I've linked to IGNITE-10078 additional related/dependent tickets requiring
investigation as soon as main contribution will be accepted.
вт, 7 мая 2019 г. в 18:10, Alexei Scherbakov :
> Anton,
>
> It's ready for review, look for Patch Available status.
> Yes, atomic caches are not fixed by this
Anton,
It's ready for review, look for Patch Available status.
Yes, atomic caches are not fixed by this contribution. See [1]
[1] https://issues.apache.org/jira/browse/IGNITE-11797
вт, 7 мая 2019 г. в 17:30, Anton Vinogradov :
> Alexei,
>
> Got it.
> Could you please let me know once PR will
Alexei,
Got it.
Could you please let me know once PR will be ready for review?
Currently, have some questions, but, possible, they caused by non-final PR
(eg. why atomic counter still ignores misses?).
On Tue, May 7, 2019 at 4:43 PM Alexei Scherbakov <
alexey.scherbak...@gmail.com> wrote:
>
Anton,
1) Extended counters indeed will answer the question if partition could be
safely restored to synchronized state on all owners.
The only condition - one of owners has no missed updates.
If not, partition must be moved to LOST state, see [1],
Ivan,
1) I've checked the PR [1] and it looks like it does not solve the issue
too.
AFAICS, the main goal here (at PR) is to produce
PartitionUpdateCounter#sequential which can be false for all backups, what
backup should win in that case?
Is there any IEP or some another design page for this
Anton,
Automatic quorum-based partition drop may work as a partial workaround
for IGNITE-10078, but discussed approach surely doesn't replace
IGNITE-10078 activity. We still don't know what do to when quorum can't
be reached (2 partitions have hash X, 2 have hash Y) and keeping
extended
Ivan, just to make sure ...
The discussed case will fully solve the issue [1] in case we'll also add
some strategy to reject partitions with missed updates (updateCnt==Ok,
Hash!=Ok).
For example, we may use the Quorum strategy, when the majority wins.
Sounds correct?
[1]
Ivan,
Thanks for the detailed explanation.
I'll try to implement the PoC to check the idea.
On Mon, Apr 29, 2019 at 8:22 PM Ivan Rakov wrote:
> > But how to keep this hash?
> I think, we can just adopt way of storing partition update counters.
> Update counters are:
> 1) Kept and updated in
But how to keep this hash?
I think, we can just adopt way of storing partition update counters.
Update counters are:
1) Kept and updated in heap, see
IgniteCacheOffheapManagerImpl.CacheDataStoreImpl#pCntr (accessed during
regular cache operations, no page replacement latency issues)
2)
Ivan, thanks for the analysis!
>> With having pre-calculated partition hash value, we can automatically
detect inconsistent partitions on every PME.
Great idea, seems this covers all broken synс cases.
It will check alive nodes in case the primary failed immediately
and will check rejoining node
Hi Anton,
Thanks for sharing your ideas.
I think your approach should work in general. I'll just share my
concerns about possible issues that may come up.
1) Equality of update counters doesn't imply equality of partitions
content under load.
For every update, primary node generates update
Igniters and especially Ivan Rakov,
"Idle verify" [1] is a really cool tool, to make sure that cluster is
consistent.
1) But it required to have operations paused during cluster check.
At some clusters, this check requires hours (3-4 hours at cases I saw).
I've checked the code of "idle verify"
12 matches
Mail list logo